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St  MM  ARY 

In  many  decision  problems,  ihe  possible  outcomes  have  several  important 
dimensions  of  value.  To  identify  the  optimal  alternative,  the  decision  analyst  must 
assess  the  decision  maker’s  preferences  over  these  multi-attribute  outcomes.  Two  rival 
procedures  for  solving  multi-attribute  preference  problems  currently  exist.  These  two 
procedures,  global  preference  modeling  and  local  preference  modeling,  each  have 
advantages  and  disadvantages.  This  dissertation  combines  these  two  rival  procedures  in  a 
new  approach  to  multi-attribute  decision  making.  This  new  combined  method,  called 
the  proxy  approach,  uses  the  advantages  of  one  technique  to  overcome  the  disadvanuges 
of  the  other. 

Global  preference  modeling  procedures  use  normative  assumptions  together  with  a 
few  assessments  to  construct  a single  function,  in  the  large,  ordering  preferences  over  all 
outcomes.  These  global  functions  aie  mathematically  simple  and  convenient,  but  they 
arc  very  restrictive.  I he  assumptions  from  which  they  arc  deiivcd  are  reasonable  locally, 
in  the  small,  but  vvi',en  assumed  globally,  in  the  large,  they  often  produce  functions  not 
truly  representing  the  decision  maker's  preferences. 

local  procedures  provide  an  alternative  approach  that  avoids  restrictive 
assumptions.  Instead  of  constructing  a single  preference  function  in  the  large.  thc'C 
procedures  build  a sequence  of  local  pretcreiicc  models,  in  the  sm.ill.  each  generating  a 
trial  solution.  Each  trial  solution  is  better  than  its  predecessor,  so  the  trial  sequence 
eventually  reaches  the  optimum.  Currently  cvisiing  local  procedures  use  successive  linear 
approximations;  iIkjC  linear  I unctions  are  poor  piefercnce  iiii'ilcls.  si)  Ihe  ilciative 
procetlure  is  slow  and  inefi  icicni.  Since  each  iteration  rci|uiris  a liiiie-coiisunnng 
inteiaelion  with  tl'.o  decision  m.-Kr.  the  sluwl;,  converging  procedure  is  noi  praeliial. 

Ibis  ilissertition  combines  ihe  desirable  lealnris  of  the  global  and  local  ieihmi|iies 
in  a new  improved  nicihod.  Ihe  noi m.iiiv.'l,  moiiv.iir-d  prefcKiue  ivud-.liol  ilieglo'.'il 
procedure  arc  incoipoiaiid  as  pre  xy  funclioiv.  ia  a loe.il  piondure.  I In  e pioxus  .ik 
belter  models  of  the  Hue  objeilivc  Ih.in  .itc  i te  linear  approxim.iiions,  ,o  the  icMilliMg 
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trial  sequence  reaches  the  optimum  much  faster.  The  new  proxy  approach  yields  rapid 
convergence  without  restrictive  assumptions. 

After  the  theoretical  aspects  of  the  proxy  approach  are  developed,  the  new 
algorithm  is  applied  to  a curriculum  planning  problem.  Tins  practical  application  was 
successful,  the  decision  maker,  previously  unfamiliar  with  decision  analysis,  was  able  to 
provide  the  asssessments  required  at  each  iteration.  With  the  help  of  various  consistency 
tests,  the  tradeoff  assessments  generated  trial  solutions  that  converged  rapidly  to  the 
optimal  solution.  Numerous  insights  into  the  interactive  use  of  the  algorithm  were 


gained  from  this  practical  application. 
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Chapter  I 

INTRODUCTION 


1.1  Multi-Attribute  Decision  Analysis 

Decision  analysis  is  a normative  methodology  for  identifying  the  best  alternative  in 
a choice  situation.  In  the  first  step  of  an  analysis,  the  analyst  models  the  relationships 
among  the  actions  the  decision  maker  can  choose  and  the  resulting  outcomes  he  will 
face.  All  important  variables  influencing  the  outcome  must  be  included.  In  the  second 
step,  the  analyst  assesses  the  decision  maker's  preferences  over  the  outcome  variables  so 
the  optimal  decision  can  be  selected.  Figure  1.1,  taken  from  Howard  [16],  illustrates 
this  paradigm  for  the  individual  decision  maker. 

The  decision  variables  dj,  dj,  ...  . d^  represent  the  choices  available  to  the  decision 

maker.  The  stale  variables  Sj,  S2 s^  represent  the  environmental  factors  affecting 

the  outcome.  Information  about  the  uncertain  state  variables  is  encoded  in  a joint 

probability  distribution  {sj,  s^ where  e represents  the  decision  maker's  state 

of  information.  The  state  variables  and  decision  variables  interact  through  the  system 
model  to  produce  the  outcome  lottery  {xld.r},  a function  of  the  decision  vector  d. 
Each  possible  outcome  has  a unique  setting  of  the  x^'s  in  the  multi-attribute  outcome 
vector  X-  The  decision  maker's  preference  ordering  over  these  multi-attribute  outcomes 
is  encoded  in  a mathematical  utility  function  that  provides  a ranking  of  the  alternatives 
so  the  decision  producing  the  highest  expected  utility  <u|d,c>  can  be  selected. 

Techniques  for  handling  the  single  attribute  problem  are  well  developed  and  have 
been  applied  successfully  in  numerous  cases  [5],[17].  The  multi-attribute  problem, 
however,  is  much  more  difficult  and  still  poses  a formidable  challenge  to  decision 
analysts.  In  this  dissertation,  I try  to  develop  an  improved  procedure  for  solving  the 
multi-attribute  problem  for  the  individual  decision  maker.  I take  the  state  variables, 
decision  variables,  and  system  model  as  given,  and  address  the  problem  of  selecting  the 
optimal  decision.  This  dissertation  is  an  attempt  to  find  a practical  procedure  to 
overcome  several  disadvantages  of  currently- used  multi-attribute  techniques,  with  the 
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Figure  1. 1 Paradigm  for  Muili-atiribule  Decision  Anaivsis 
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hope  of  enabling  decision  analysts  to  deal  more  effectively  with  complex  decisions. 

1.2  Global  vs.  Local  Procedures 

Two  different  approaches  are  available,  at  least  theoretically,  to  solve  any 
multi-attribute  problem.  The  first  is  an  exhaustive  comparison  of  alternatives:  with  this 
approach,  the  analyst  assesses  preferences  at  every  possible  outcome,  thereby  obtaining 
directly  the  complete  preference  ordering.  This  exhaustive  search  requires  a large,  often 
infinite  number  of  assessments;  it  has  little  or  no  practical  application.  The  second 
approach  is  a modeling  technique;  the  analyst  uses  behavioral  assumptions  together  with 
a few  assessments  to  model  preferences  analytically.  Decision  analysts  use  this  modeling 
approach  since  it  provides  efficient  procedures  for  analyzing  multi-attribute  problems. 

I divide  these  preference  modeling  procedures  into  two  broad  categories;  global 
procedures  and  local  procedures.  Global  procedures  construct  a single  preference 
function  ordering  all  outcomes;  local  procedures  do  not  construct  such  a function.  When 
using  a global  procedure,  the  analyst  assesses  preferences  at  a few  outcomes  and  makes 
normative  assumptions  that  uniquely  specify  the  preference  ordering  at  all  other 
outcomes.  These  assumptions  restrict  the  preference  function  to  specific  families  of 
curves.  Once  the  family  is  selected,  a small  number  of  assessments  determine  the  free 
parameters.  The  resulting  preference  function  applies  over  the  entire  decision  region, 
hence  the  name  global  preference  function. 

These  global  functions  are  mathematically  simple  and  convenient,  but  they  have 
disadvantages.  The  assumptions  from  which  the  specific  functional  forms  are  derived 
are  very  strong  conditions.  They  are  reasonable  in  local  regions,  in  the  small,  but  when 
assumed  globally,  in  the  large,  they  are  very  restrictive.  There  may  be  instances  in  which 
a function  fit  from  an  assessment  in  the  small  adequately  represents  preferences  in  the 
large:  in  these  instances  the  global  procedures  should  be  used.  Generally,  however,  these 
global  procedures  force  the  decision  maker  to  fit  a function  not  truly  representing  his 
preferences.  If  the  decision  maker  has  a non-additive  preference  structure,  the  problem 
is  particularly  acute  since  all  commonly  used  preference  functions  have  additive 
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deterministic  forms.  The  Cobb-Douglass  example  in  Figure  1.2  shows  the  restrictiveness 
of  global  parameterization  from  local  assessment.  The  parameters  assessed  at  one  point 
must  characterize  behavior  over  the  entire  region.  Even  with  consistency  checking  at  a 
second  point,  one  set  of  parameters  must  ultimately  hold  everywhere. 

These  disadvantages  of  global  preference  modeling  motivate  the  search  for  an 
alternative  approach.  Local  procedures,  the  second  category  of  preference  modeling 
techniques,  provide  this  alternative.  If  we  are  willing  to  construct  a series  of  local 
preference  models,  in  the  small,  we  can  develop  an  iterative  procedure  that  scans  the  set 
of  alternatives  to  locate  the  optimum.  At  each  iteration  of  this  procedure,  the  local 
preference  model  provides  a trial  solution.  Each  trial  solution  is  better  than  its 
predecessor,  so  the  sequence  eventually  reaches  the  optimum.  This  iterative  technique 
avoids  the  strong  restrictions;  it  never  specifies  a global  preference  function.  Local 
procedures,  however,  require  more  assessments  than  global  procedures,  often  too  many 
assessments  to  be  practical.  This  dissertation  is  an  attempt  to  find  a local  procedure  to 
solve  multi-attribute  problems,  avoiding  the  restrictions  of  global  preference  modeling, 
and  at  the  same  time,  requiring  only  a reasonably  small  number  of  assessments. 

1.3  Outline  of  Thesis 

Dean  Boyd  [4]  made  the  first  attempt  to  solve  this  basic  problem  cast  in  a decision 
analysis  framework.  He  developed  a procedure  using  local  tradeoff  assessments  to 
parametrize  local  linear  approximations  of  the  true  preference  function.  His  procedure 
avoids  global  restrictions,  but  requires  too  many  assessments  to  be  practical.  I review 
Boyd's  thesis  and  other  related  literature  in  Chapter  II. 

My  major  contribution  is  a merger  of  the  global  and  local  procedures.  I develop  a 
new  algorithm  that  incorporates  the  desirable  features  of  both  techniques;  it  uses  the 
normalively  motivated  preference  models  of  the  global  procedure  as  proxy  functions  in  a 
local  procedure.  These  proxies  are  better  models  of  the  true  preference  function. 

I Therefore,  the  sequence  of  trial  solutions  they  generate  reaches  the  optimum  much 

I 

1 faster.  This  new  proxy  algorithm  uses  the  advantages  of  one  technique  to  overcome  the 
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disadvantages  of  the  other;  the  result  is  a combined  procedure  that  improves  both 
original  techniques.  Chapter  III  presents  the  theoretical  aspects  of  my  methodology  for 
decision  making  under  certainty.  It  includes  several  applications  of  optimization 
theorems  that  help  solve  the  multi-attribute  decision  interpreted  as  a resource  allocation 
problem.  Chapter  III  also  includes  tests  of  assessment  consistency  that  keep  the  trial 
sequence  from  going  astray. 

Chapter  IV  compares  the  new  algorithm  to  the  old.  Convergence  comparisons  show 
the  new  algorithm  is  much  faster;  it  requires  few  enough  assessments  to  be  practical  for 
decision  making  under  certainty. 

In  Chapter  V,  I examine  the  proxy  approach  under  uncertainty.  Theoretical  aspects 
of  decision  making  under  uncertainty  present  major  obstacles.  Consequently,  the  proxy 
approach  in  its  current  form  is  not  useful  for  problems  in  which  uncertainty  plays  a 
major  role. 

The  true  practical  test  of  any  theoretical  decision-making  procedure  is  a real 
problem.  Chapter  VI  describes  the  application  of  the  proxy  approach  to  a curriculum 
planning  problem  of  a small  private  school  in  San  Jose,  California.  This  practical 
application  was  successful;  it  provided  numerous  insights  into  the  interactive  use  of  the 
procedure. 

Chapter  VII  summarizes  the  key  results  of  the  thesis  and  includes  suggestions  for 
future  research. 
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RELATED  LITERATURE 

2.1  Axioms  of  Deterministic  and  Risk  Preference 

Most  treatments  of  decision  theory  begin  with  the  assumption  that  deterministic 
preferences  can  be  represented  by  a set  of  three  binary  relations  defined  over  all 
outcomes  x-  The  three  relations  are; 

i.  Strict  Preference  > 

x^  > if  is  strictly  preferred  to  x^ 

ii.  Indifference  ~ 

X*  ~ x^  if  X*  is  indifferent  to  x^ 

iii.  Weak  Preference  > 

x'  > x^  if  X*  '>  or  x^  ~ x^ 

Four  axioms  governing  the  decision  maker's  behavior  under  certainty  are  listed 
below.  All  are  quite  standard  and  can  be  found  in  most  developments  of  preference 
theory  [23]. 

Axiom  2.1.  Weak  ordering.  The  relation  > is  transitive  and  connected;  > is 
transitive  if  x’  > x^  and  x^  > x^  imply  x'  > x^  > is  connected  if  x'  > x^  or  x^  > 
X*  for  all  x'  and  This  axiom  prevents  the  decision  maker  from  being  a "money 
pump".  A violation  of  transitivity  implies  a willingness  to  pay  to  accomplish  nothing. 

Axiom  2.2.  Continuity.  If  x'  > x^  and  x^  > x\  then  there  is  a real  number  c. 
0 < c < 1,  such  that  cx’  + (l-c)x^  ~ x^.  This  axiom  indicates  the  decision  maker  is 
willing  to  make  tradeoffs. 

Axiom  2.3.  Nonsatiety.  If  x^’  > xT  for  all  i and  x^’  > x^^  for  some  j,  then 
x'  > x^.  This  axiom  means  the  individual  prefers  more  to  less  of  each  attribute.  The 
outcome  attributes  must  be  modeled  so  each  x^  is  a desirable  good. 

Luce  and  Suppes  [23]  proved  that  Axioms  2. 1-2. 3 guarantee  the  existence  of  a 
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continuous  real-valued  deterministic  preference  function  V(x)  such  that  V(x’)  > 
V(x-)  if  and  only  if  x'  > x^.  The  function  V(x)  is  unique  up  to  a monotonic 
increasing  transformation. 

Axiom  2.4.  Decreasing  Marginal  Rates  of  Substitution.  For  any  and  the 
amount  of  x^  traded  for  each  additional  Ax^  decreases  as  x^  increases.  Assuming 
differentiability,  3[-(dX|/dXj)|jy.Q]/9Xj  < 0.  This  axiom  states  that  the  decision  maker 
becomes  less  sensitive  to  incremental  changes  AXj  as  his  amount  of  Xj  increases. 

These  four  axioms  provide  the  foundation  for  mathematical  preference  structures 
for  deterministic  decision  making.  In  the  remainder  of  this  thesis,  we  refer  to  them  as 
the  deterministic  preference  axioms.  Analogous  results  guaranteeing  the  existence  of  a 
real-valued  risk  preference  function  for  decision  making  under  uncertainty  were 
pioneered  by  von  Neumann  and  Morgensiern  [27]  and  later  revised  by  Savage  [31].  A 
convenient  form  of  the  Savage  axioms,  taken  from  Howard  [18],  is  listed  below.  In  the 
reference  lottery  [p^^,  x;  p^,  y;  p^.  z],  p^,  p^,  and  p^  are  the  probabilities  of  prizes  x, 
y,  and  z,  respectively. 

Axiom  2.5  Orderability.  For  all  x and  y,  either  x > y,  x - y,  or  x < y. 

Axiom  2.6.  Continuity.  If  x > y > z,  then  there  is  a real  number  p,  0 < p < 1, 
sui  h that  [1.  y]  ~ [p,  x;  (1-p),  z].  The  quantity  y is  called  the  certain  equivalent  of 
the  lottery. 

Axiom  2.7.  Substitutability.  A lottery  and  its  certain  equivalent  are  interchangeable 
with  no  change  in  preferences. 

Axiom  2.8.  Monotonicity.  If  x > y,  then  [p.  x;  (1-p),  y]  > [p'.  x;  (1-p’),  y]  if 
and  only  if  p > p'. 

Axiom  2.9  Decomposahility.  [p,  {q,  x;  (l-q),  y);  (1-p).  y]  ~ [pq.  x;  (l-pq),  y]. 

In  the  remainder  of  this  thesis,  we  refer  to  Axioms  2. 5-2. 9 as  the  risk  preference 
axioms. 
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2.2  State-of-the-Art  Techniques 

The  literature  on  multi-attribute  utility  theory  is  voluminous;  it  includes 
contributions  from  economists,  decision  analysts,  and  mathematical  psychologists.  In 
this  thesis,  I review  only  those  techniques  that  rest  upon  a normative  foundation  and 
relate  directly  to  my  results.  Within  the  decision  analysis  profession,  there  are  currently 
two  different  approaches  to  assessing  multi-attribute  utility  functions.  One  approach  is 
associated  with  the  Stanford  Research  Institute  and  Stanford  University  and  the  other 
with  Harvard  and  M.l.T.  Both  procedures  lead  to  a multi-attribute  utility  function,  but 
the  viewpoints  of  risk  attitude  and  methods  of  assessment  are  quite  different 

The  Stanford  group  views  risk  attitude  as  a one-dimensional  phenomenom  separate 
from  deterministic  tradeoffs  among  attributes.  Figure  2.1  illustrates  this  decomposition 
approach.  The  deterministic  preference  axioms  guarantee  the  existence  of  a 
deterministic  preference  function  V(x)  that  rank  orders  preferences  over  all  possible 
multi-attribute  outcomes.  This  preference  function  is  mapped  into  a scalar 

order-preserving  numeraire  function  n(x).  Using  the  risk  preference  axioms,  the  analyst 
assesses  risk  attitude  over  the  one-dimensional  numeraire  to  specify  the  multi-attribute 
risk  preference  function  u[n(x)]. 

Barrager  and  Keelin  have  written  the  most  recent  dissertations  [3], [20]  on 
multi-attribute  utility  theory  in  the  Stanford  decision  analysis  research  program.  Using 
the  normative  axioms  (2. 1-2.9)  as  a foundation,  they  specify  additional  assumptions  that 
limit  the  form  of  the  preference  function.  Barrager  suggests  properties  over 
multi-period  consumption  preferences  from  which  he  derives  the  sum-of-exponentials. 
Cobb- Douglass,  and  sum-of-powers  preference  functions.  Keelin  generalizes  Barrager's 
procedure  and  adds  new  preference  parameters  to  encode  utility  functions  for  general 
multi-attribute  problems.  The  additional  assumptions  they  require  for  the 

sum-of-exponentials,  sum-of-powers,  and  Cobb-Douglass  functions  are  listed  in 
Appendix  B.  These  assumptions  are  restrictive  since  the  same  form  of  the  preference 
parameter  (the  marginal  value  reduction  coefficient)  must  hold  everywhere.  Of  all  the 
assumptions  they  propose  that  lead  to  specific  families  of  curves,  these  three  arc  the  least 
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Figure  2.1  Decomposition  of  Deterministic  and  Risk  Preference 
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1 

* restrictive.  This  global  modeling  technique  provides  mathematically  simple  functions, 

but  it  is  built  upon  the  restrictive  assumptions  we  are  trying  to  avoid. 

Howard  Raiffa  and  Ralph  Keeney  [21],  at  Harvard  and  M.I.T.,  and  Peter  Fishburn 
[11],  at  the  Research  Analysis  Corporation,  advocate  a methodology  in  which 
deterministic  and  risk  preferences  are  jointly  assessed.  They  use  a hierarchy  of 
independence  assumptions  among  subsets  of  attributes  to  restrict  the  preference  function 
to  additive  or  multiplicative  forms.  From  the  Stanford  viewpoint,  their  procedure  has 
one  major  drawback:  it  requires  inference  of  deterministic  tradeoffs  from  probabilistic 
questions.  The  risk  additive  independence  and  utility  independence  conditions  from 
which  they  derive  the  additive  and  multiplicative  utility  functions  are  listed  in  Appendix 
B.  My  discussion  of  the  Keeney-Raiffa  methodology  is  brief  since  my  research  does  not 
draw  upon  it  directly,  but  a detailed  account  is  available  in  their  forthcoming  book  [21]. 

Conjoint  measurement  is  a third  approach  leading  to  global  deterministic  preference 
functions.  With  this  technique,  the  decision  maker  rank  orders  different  combinations 
of  attributes  "considered  jointly".  A regression  routine  then  assigns  to  each  discrete  level 
of  each  attribute  a numerical  value  that  minimizes  the  errors  of  the  rank  orderings 
according  to  an  additive  or  multiplicative  value  model.  The  global  restriction  enters  this 
! technique  through  the  underlying  value  model.  Conjoint  measurement  and  the  closely 

j 

related  technique  of  multidimensional  scaling  have  been  applied  recently  to  study 
consumer  preferences  in  marketing  research  [19],[22].[24]. 

All  three  multi-attribute  procedures  described  above  are  built  upon  strong 
assumptions  that  limit  the  form  of  the  preference  function.  In  the  next  section.  I 
examine  techniques  designed  to  avoid  these  restrictions. 

'! 

2.3  Boyd's  Successive  Approximation  Algorithm 

j Dean  Boyd  [4]  developed  a multi-attribute  procedure  that  does  not  construct  a 

global  preference  function.  Instead,  it  uses  local  models  to  generate  a sequence  of  trial 

i 

solutions  that  eventually  converge  to  the  optimum.  We  will  first  consider  decision 

I making  under  certainty  to  see  how  his  procedure  operates. 
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Boyd  makes  the  following  assumptions: 

i.  The  deterministic  preference  axioms  hold,  so  a concave  and  differentiable 
deterministic  preference  function  V(x)  exists.  However.  V(x)  is  unknown  and 
assessment  of  the  entire  V(x)  is  impossible. 

ii.  The  attributes  are  modeled  so  negative  quantities  are  meaningless; 
therefore  x > 0. 

iii.  The  decision  makers  resources  are  bounded  by  the  convex  constraint 
set  X s {x  I h(x)  s 0,  g(x)  < 0,  X > 0}.  (This  characterization  of  the  feasible  region  is 
a standard  nonlinear  programming  formulation  that  will  be  useful  in  establishing  later 
results).  The  set  of  feasible  decisions  D is  a subset  of  N-dimensional  Euclidean 
space.  D = {d  I x(d)  € X}. 

Under  these  assumptions,  Boyd  tries  to  solve  the  following  problem: 

Maximize^  V[x(d)]  (2.1) 

subject  to  d € D. 

This  notation  indicates  he  is  trying  to  choose  the  decision  d*  that  leads  to  the  most 
preferred  outcome  x(d*).  Figure  2.2  illustrates  the  following  simple  two-dimensional 
case  (x  is  understood  to  be  a function  of  the  decision  d): 

Maximize  V(Xj,X2) 

subject  to  CjXj  + cjx^  < b and  x,.  Xj  > 0. 

Pretending  V(x)  is  known,  we  draw  its  isovalue  curves,  projected  onto  the  XjXj  plane. 
Each  isovalue  curve  is  a locus  of  points  among  which  the  decision  maker  is  indifferent; 
economists  refer  to  these  loci  as  indifference  curves.  By  the  definition  of  V(x), 
prospects  on  a higher  curve.  V(x)  = k2.  are  preferred  to  prospects  on  a lower  curve, 

1 V(x)  = kj,  where  kj  > kj.  With  Axiom  2.4,  we  assumed  the  marginal  rates  of 

substitution  are  decreasing  along  each  indifference  curve.  Since  the  slope  dxj/dx^  q 
is  increasing  in  Xj.  but  always  negative  and  finite,  the  indifference  curves  are  convex  to 
the  origin.  In  this  problem.  Boyd  is  trying  to  maximize  V(x)  subject  to  the 
i constraints.  He  can  find  this  maximum  by  identifying  the  point  where  an  indifference 

I 

i; 
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curve  is  just  tangent  to  the  constraint 

Boyd's  first  step  is  tradeoff  assessment;  he  determines  the  Axj  at  which  the 
decision  maker  is  indifferent  between  (Xj,  x^)  and  (Xj-AXj,  x^+Ax^).  Figure  2.3  shows 
that  this  tradeoff  Axj/AXj  is  the  negative  slope  of  the  tangent  to  the  indifference 
curve.  It  expresses  the  amount  of  Xj  the  decision  maker  is  willing  to  sacrifice  in  order 
to  gain  a unit  increment  of  x^,  leaving  all  other  attributes  constant.  The  tradeoff 
XjP x(d)1  is  the  marginal  rate  of  substitution  of  Xj  for  x^  at  x(d),  using  Xj  as  a 
price  variable: 

Xj(x)  = dxj/dXj  I 

To  assess  tradeoffs  at  any  point  x(d),  the  analyst  presents  the  following  prospects 
to  the  decision  maker; 

x = *2 ’‘j ’'n] 

X'  = [xj  - Ax,.  Xj Xj  + Av. x^] 

For  a small  fixed  Ax^,  small  enough  so  the  indifference  curve  is  approximately  linear 
but  large  enough  so  the  increment  is  meaningful,  the  analyst  varies  Ax,  until  the 
decision  maker  is  indifferent  between  x and  x'.  At  this  level,  Aj[x(d)]  ~ 
Ax,/Axj.  Since  X,  is  always  one,  there  are  only  N-1  degrees  of  freedom  among 
tradeoffs  at  any  point.  Consistency  can  be  checked  by  assessing  a second  set  of  tradeoffs 
with  a different  price  variable  x^  since  the  chain  rule  implies  X,j  Xj^  = X,|j. 

Boyd  defines  the  pseudo-objective  function  h[x(d)|x(d*‘)]  as  a scaled  linear 
approximation  of  the  true  preference  function  fit  at  x(d^).  Taking  the  first-order 
Taylor  expansion  of  V at  x(d'‘),  dividing  by  [9V(x)/9x,]j^ij^ji,,,  and  subtracting  the 
constant  terms,  Boyd  defines 

h[x(d)|x(d^)]  = 2,  ([^''(x)/9x,]/[aV(x)/9x,X,,(ji))  x,(d) 

= 2,  x,(d) 

where  X,  is  the  tradeoff  using  x,  as  the  price  variable.  This  pseudo-objective  is  not 
the  true  preference  function;  it  is  a linear  approximation  valid  in  a small  neighborhood 
of  x(d^).  Boyd  proves  the  following  theorem: 
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Theorem  2.1.  If  the  decision  maker's  preference  ordering  satisfies  the  deterministic 
preference  axioms  (2. 1-2. 4),  and  if  X(D)  is  convex,  then  if  d*  maximizes 
h[x(d)|x(d*)]  over  all  d € D.  then  d*  also  maximizes  V[x(d)]  for  all  d € D. 

If  the  pseudo-objective  fit  at  d*  achieves  its  maximum  at  d*.  then  the  true 
objective  also  achieves  its  maximum  at  d*.  Figure  2.4  shows  that  at  a non-optimal 
decision  d",  the  gradient  to  the  pseudo-objective  has  a component  in  the  feasible 
region.  However,  at  ^d*),  there  is  no  feasible  direction  of  improvement,  so  d*  is  the 
optimum.  This  simple  illustration  of  the  Kuhn-Tucker  conditions  (see  Appendix  C) 
shows  maximization  of  the  first  order  pseudo-objective  at  d*  is  a sufficient  condition 
for  maximization  of  the  true  objective.  Since  the  linear  pseudo-objective  is  concave  and 
the  feasible  region  is  convex,  the  condition  is  necessary  as  well.  A formal  proof  of  a 
more  powerful  version  of  this  theorem  is  included  in  Chapter  III. 

This  theorem  by  itself  does  not  help  solve  the  decision  problem  since  it  requires 
prior  knowledge  of  the  optimum.  However,  it  serves  as  the  backbone  of  Boyd's 
successive  approximation  algorithm  outlined  below: 

Step  0.  Choose  an  arbitrary  d®  and  assess  tradeoffs.  Let  k = 0. 

Step  1.  Fit  the  pseudo-objective  function  using  tradeoffs  at  d^  and  maximize 
h[x(d)|x(d^)]  over  all  d € D.  This  maximization  yields  a new  d'‘**. 

Step  2.  If  d*'*'  = d*",  stop;  d'‘*’  is  the  optimum  (by  Theorem  2.1). 

If  d*'*’  ^ d'‘,  assess  tradeoffs  at  d'‘*’.  Let  k = k+1  and  return  to  Step  1. 

This  algorithm  generates  a sequence  of  points  hopefully  converging  to  the 
optimum.  Figure  2.5  shows  a simple  example;  the  underlying  indifference  curves  are 
shown  for  illustrative  purposes  (in  our  real  problem,  these  curves  are  unknown).  After 
arbitrarily  selecting  d*’  as  an  initial  feasible  point,  the  analyst  assesses  the  marginal  rate 
of  substitution  of  Xj  for  Xj  at  x(d®)  to  determine  the  local  linear  approximation 
h[x(d)|x(d®)].  Figure  2.5a  shows  this  linear  pseudo-objective  is  maximized  at  x(d’). 
Since  d*  * d®,  the  new  point  x(d’)  is  used  in  the  next  iteration.  The  tradeoff 
assessment  at  x(‘l')  yields  a new  pseudo-objective  h[x(d)|x(d')]  which  achieves  its 
maximum  at  x(d^).  Figure  2.5b  shows  x^  lies  on  a lower  indifference  curve,  implying 
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X*  > x^.  For  the  algorithm  to  converge,  each  iteration  must  provide  a new  point 
preferred  to  its  predecessor.  The  prospect  x^  is  not  an  improvement,  but  it  indicates  a 
feasible  direction  of  improvement.  Consequently,  a new  point  preferred  to  x'  can  be 
chosen  somewhere  along  the  line  segment  from  x'  lo  x^  Figure  2.5b  shows  a new 
point  x^  chosen  in  this  manner;  this  new  trial  solution  lies  on  a higher  indifference 
curve  than  xV  The  iterative  procedure  continues  until  d*^*'  = d*'.  Figure  2.5c  shows 
that  h[x(d)lx(d*)]  is  maximized  at  x(d*);  no  further  improvement  can  be  made,  so 
d*  is  the  optimum. 

Successive  linear  approximation  of  a nonlinear  objective  function  is  not  a new 
technique.  The  method  was  originally  proposed  by  Frank  and  Wolfe  [12]  in  1956  for 
problems  with  linear  constraints.  In  the  original  Frank-Wolfe  algorithm,  each  iteration 
used  an  exact  line  seaich  to  find  the  next  point.  The  procedure  has  since  been  applied 
with  inexact  line  searches  to  problems  with  linear  and  nonlinear  constraints.  Boyd's 
contribution  was  the  application  of  this  modified  Frank-Wolfe  algorithm  to  the 
multi-attribute  decision  in  which  the  preference  function  is  unknown.  Unable  to 
evaluate  the  gradient  of  V at  each  x*^  (since  V is  unknown),  he  assesses  the  decision 
maker's  tradeoffs  at  x*^.  These  tradeoffs  provide  a vector  X(x*‘)  collinear  with 
VV(x’‘)  since 

A/x^)  = [av(x^)/axj]/[av(x'‘)/axi]. 

Maximizing  the  pseudo-objective  A(x'')^x  is  equivalent  to  maximizing  VV(x'‘)x  in  the 
Frank-Wolfe  algorithm  since  the  scaling  factor  0V(x‘^)/OXj  is  a positive  constant. 

In  summary,  Boyd’s  procedure  avoids  global  preference  modeling  by  using  a 
sequence  of  tradeoff  assessments,  local  linear  models,  and  optimizations.  However,  Boyd 
pays  a heavy  price  to  avoid  the  restrictive  assumptions;  his  iterative  procedure  requires 
many  more  assessments  and  optimizations.  It  converges  very  slowly  in  general  since  the 
linear  pseudo-objectives  are  very  poor  preference  models  even  in  the  small.  The  rate  of 
convergence  decreases  with  increasing  nonlinearity  of  the  true  preference  function;  in 
many  problems  the  true  preference  function  is  highly  nonlinear.  We  must  ask  ourselves 
if  Boyd's  algorithm  solves  our  basic  problem.  In  theory,  it  docs,  but  in  practice,  it  does 
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not,  since  the  large  number  of  iterations  makes  the  assessment  demands  prohibitive. 

Geoffrion,  Dyer,  and  Feinberg  [8], [14]  independently  developed  an  interactive 
procedure  similar  to  Boyd's  successive  approximation  algorithm.  Their  scheme  also  uses 
the  Frank-Wolfe  linear  approximation  technique  at  each  iteration.  However,  they  do 
not  ask  the  decision  maker  if  is  preferred  to  x*‘-  Rather,  they  require  the  decision 
maker  to  choose  his  most  preferred  point  along  the  line  segment  from  x'‘  to  x'‘*^  The 
key  difference  is  that  Geoffrion,  Dyer,  and  Feinberg’s  procedure  requires  the  decision 
maker  to  choose  among  many  outcomes  all  at  once,  in  contrast  to  Boyd’s  scheme 
requiring  comparison  of  only  two  points  at  a time.  The  decision  maker's  task  is  more 
difficult,  but  the  relaxation  procedure  is  avoided.  Dyer  [9]  and  Hogan  [15]  investigate 
the  convergence  of  the  modified  Frank-Wolfe  method  using  an  approximation  of  the 
true  gradient  at  each  step.  I will  use  several  of  their  convergence  analyses  in  Chapters  III 
and  IV. 

Wehrung  [33]  establishes  theoretical  foundations  for  interactive  identification  and 
optimization  of  preferences.  For  both  numerical  and  qualitative  forms  of  preference,  he 
examines  elicitation  procedures  and  mathematical  programming  techniques  that  guide  the 
search  and  indicate  trial  points  where  the  decision  maker  should  identify  his 
preferences.  Tests  monitoring  the  consistency  of  the  assessed  information  with  the  basic 
rationality  conditions  are  also  developed.  Wehrung's  results  are  entirely  theoretical;  he 
makes  no  attempt  in  his  dissertation  to  develop  a practical  procedure. 

My  dissertation  draws  bits  and  pieces  from  most  of  the  literature  reviewed  in  this 
chapter  and  from  optimization  theory,  but  the  primary  motivation  was  Dean  Boyd’s 
thesis  [4].  I felt  the  same  dissatisfaction  with  restrictive  state-of-the-art  techniques  that 
he  felt.  I thought  his  alternative  approach  was  clever  and  imaginative,  and  in  this 
dissertation,  I try  to  make  it  operational. 
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Chapter  III 

THE  PROXY  ITERATION  ALGORITHM 
FOR  DECISION  MAKING  UNDER  CERTAINTY 


3.1  New  Proxy 

The  pseudo-objective  governs  the  speed  at  which  the  successive  approximation 
algorithm  converges.  At  each  iteration,  the  pseudo-objective  is  maximized  as  if  it  were 
the  true  objective  and  a new  trial  point  is  found.  Since  Boyd's  linear  functions  are  very 
poor  pseudo-objectives,  the  trial  points  they  generate  are  very  poor  pseudo-optima.  As  a 
result,  the  sequence  progresses  very  slowly  toward  the  true  optimum. 

I propose  to  use  the  preference  functions  derived  by  Barrager  [3],  and  generalized 
by  Keelin  [20],  as  local  proxies  in  the  successive  approximation  algorithm.  As  a first 
step,  I use  the  global  assessment  procedure  to  encode  a preference  function,  but  instead 
of  using  this  function  as  a global  model,  1 use  it  only  as  a local  proxy.  At  each  iteration, 
a new  tradeoff  vector  is  assessed  to  update  the  proxy.  Since  the  preference  function  is  a 
very  good  model  in  the  small,  the  algorithm  should  converge  at  a much  higher  rate.  In 
this  proposed  procedure,  the  proxy  is  never  assumed  to  be  the  true  preference  function, 
even  in  the  small;  it  serves  only  as  a mechanism  guiding  the  search  for  the  optimal 
decision. 

Even  though  we  do  not  use  Barrager's  functions  as  global  models,  we  still  must  be 
convinced  that  they  are  suitable  local  proxies.  Barrager  provides  normative  motivations 
for  the  sum-of-exponentials,  sum-of-powers,  and  Cobb- Douglass  preference  functions 
in  a time-preference  context  and  Keelin  restates  them  in  more  general  terms.  Appendix 
B includes  a brief  summary  of  their  arguments. 

To  demonstrate  the  new  algorithm,  1 will  develop  my  methodology  using  the 
sum-of-exponentials  preference  function  as  a local  proxy  to  solve  problem  (2.1).  I call 
the  local  model  a proxy  rather  than  a pseudo-objective. 

Definition  3.1.  Let  p[x(d)]  be  the  sum-of-exponentials  approximation  of  the 
deterministic  preference  function  V[x(d)]; 
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P[x(d)  I x(d").  xCd"-’)]  = '2, 

where  and  tOj  are  determined  from  tradeoff  assessments  at  xCd")  and  x(d"'') 
such  that  Vp[x(d'’)]  is  collinear  with  VV[x(d")],  and  '^p[x(d'’‘^)]  is  collinear 
with  VV[x(d"’‘)]. 

Theorem  3.1.  If  the  decision  maker's  preference  ordering  satisfies  the  deterministic 
preference  axioms  (2.1-2.4),  if  X(D)  is  convex,  and  if  x(d*)  is  a regular  point  of  the 
constraints  (see  Appendix  B),  then  if  d*  maximizes  p[x(d)  1 x(d*),  x(d'‘)]  over  all  d 
€ D,  for  any  x(d'‘),  then  d*  also  maximizes  V[x(d)]  for  all  d € D. 

If  the  proxy  fit  at  d*  achieves  its  maximum  at  d*.  then  the  true  objective  also 
achieves  its  maximum  at  d*.  This  theorem  generalizes  Theorem  2.1  since  it  holds  for 
any  concave  proxy,  the  linear  proxy  being  the  simplest  case. 

Figure  3.1  illustrates  the  idea  of  Theorem  3.1  in  a two-dimensional  example.  If  the 
gradient  to  the  objective  has  a component  in  the  feasible  region,  further  improvement 
can  be  made,  so  x^d")  is  not  optimal.  If  the  gradient  to  the  objective  is  perpendicular 
to  the  constraint,  as  at  x(d*),  no  direction  of  improvement  is  feasible,  so  d*  is 
optimal.  Figure  3.1  gives  a simple  interpretation  of  the  Kuhn-Tucker  optimality 
conditions  (see  Appendix  C).  The  preference  function  V(x)  is  never  specified:  only 
vectors  collinear  with  its  gradient  at  a few  points  are  known.  When  maximizing  the 
concave  objective  with  the  convex  constraint  set,  this  first-order  information  is  both 
necessary  and  sufficient  to  guarantee  optimality.  Nonconvex  feasible  regions  are 
discussed  in  section  3.4. 

Proof:  The  proxy  p(x)  and  the  true  objective  V(x)  are  concave,  the  constraint 
set  X(D)  = {x  1 h(x)  = 0,  g(x)  < 0,  x > 0}  is  convex,  and  x(d*)  is  a regular  point  of 
the  constraints.  Therefore,  if  d*  maximizes  p[x(d)  | x(d*),  x(d^)]  for  all  d € D,  for 
any  x(d^),  the  Kuhn-Tucker  necessary  conditions  guarantee  the  existence  of  X and 
K > Q-  such  that 

Vp(x*)  + XVh(x*)  + iiVg(x*)  = 0. 
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By  construction, 

Vp(x*)  = lVV(x*) 

for  some  positive  scalar  t;  by  defining  t = (l/t)A  and  v = (l/t)a,  we  have 

VV(x*)  + Trh(x*)  + »i^g(x*)  = 0.  £ > 0. 

But  this  equation  satisfies  the  Second-Order  Sufficiency  Conditions  (Appendix  C)  since 
V is  concave  and  X(D)  is  convex,  so  x(d*)  maximizes  the  true  objective.  These 
identical  steps,  applied  in  reverse,  prove  the  converse.  Q.E.D. 

Theorem  3.1  holds  for  any  concave  approximation  that  is  fit  from  a vector  collinear 
with  VV(x)  at  x(d*)  and  that  satisfies  the  deterministic  preference  axioms.  This 
generalization  is  true  since  the  optimality  criterion  requires  only  the  gradient  of  the 
proxy  at  x(d*).  Therefore,  the  sum-of-powers  preference  function,  V(x)  = 2)j  ajXj”**', 
and  the  Cobb-Douglass  preference  function.  V(x)  = 2^  a^lnx,,  could  also  serve  as 
proxies  for  V(x). 

Just  as  in  Boyd's  development,  this  theorem  by  itself  does  not  solve  the  decision 
problem  since  it  requires  prior  knowledge  of  the  optimum.  However,  it  motivates  a new 
successive  approximation  algorithm,  using  the  sum-of-exponentials  proxy  at  each 
iteration: 

Step  0.  Choose  an  arbitrary  d®  and  d'  and  assess  tradeoffs.  Let  k s 1. 

Step  1.  Fit  a sum-of-exponentials  proxy  using  tradeoffs  at  d*‘  and  d'‘‘*  and 
maximize  p[x(d)  1 x(d'‘).  ^^(d**'’)]  over  all  d € D.  This  maximization 
yields  a new  d^**. 

Step  2.  If  d*‘*'  = d'‘,  stop;  d*'*’  is  the  optimum  (by  Theorem  3.1). 

If  d^*'  ^ d*',  assess  tradeoffs  at  d*'*’.  Let  k = k + 1 and  return  to  Step 

1. 

I call  this  new  procedure  the  proxy  iteration  algorithm.  The  algorithm  as  written 
here  is  not  guaranteed  to  converge.  However,  it  is  fail-safe  since  if  it  does  converge,  the 
result  is  the  true  optimum.  Special  devices  will  be  added  in  section  3.4  to  guarantee 
global  convergence.  In  order  to  develop  the  algorithm  in  its  simplest  form,  we  assume 
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initially,  in  sections  3.2  through  3.4,  that  the  decision  maker's  tradeoffs  are  consistent 
with  a deterministic  preference  function  satisfying  Axioms  2. 1-2. 4.  We  then  relax  this 
assumption  in  Section  3.5  and  examine  the  effects  of  assessment  error. 

3.2  Information  Requirements  of  the  New  Proxy 

The  sum-of-exponentials  function  is  a higher-order  model  and  requires  more 
parameters  to  fit.  Since  an  ordinal  preference  function  is  unique  up  to  an  increasing 
monotonic  transformation,  the  constant  a^  can  arbitrarily  be  set  equal  to  one  in  p(x) 

= - a^e  **’'*'.  The  remaining  2N-1  parameters,  ‘^i’  ‘^2-  •••  • ‘^N>  be 

calculated  from  tradeoff  assessments.  At  any  x.  there  are  N-1  tradeoffs  Xj(x),  i = 2, 
...  , N.  since  Xj  = 1 when  Xj  is  the  price  variable.  A full  set  of  N-1  tradeoffs  at 
each  of  two  points  plus  a single  tradeoff  at  a third  point  are  required  to  fit  the  2N-1 
parameters.  The  numerical  tradeoffs  Xj  actually  assessed  relate  to  the 
sum-of-exponentials  parameters  aj  and  w,  in  the  following  way; 

A.(X)  = [ap(x)/ax,]  / [ap(x)/3xi]  = (a.,a,e’‘^>’‘')  / (a,iaie""l’'i). 

so  the  ratio  of  X,(x*)  to  Xj(x^)  is 

[A,(3‘)]/[A,(x^)]  = 

Taking  the  logarithm  of  both  sides. 

In  [X,(x')/Xj(x2)]  = - x^^)  - «,(Xj*  - x,^).  i = 2.  3 N. 

Since  there  are  N-1  equations  and  N unknowns,  one  more  Xj(x^)  is  assessed  to 
provide  a second  equation  in  wj  and  Wj. 

In  [Xj(x*)/Xj(x^)]  » w,(xj'  - Xj5)  - «j(x^>  - Xj^). 

Solving  for  Wj  by  Cramer’s  rule. 

= [(XjJ  - x/)  In  [X/x')/X/x^)]  + (x/  - x/)  In  [X/x'l/X/x^)]]  / 

[(X,’  - Xj^)  (x^^  - Xj')  + (x,’  - x,^)  (x^*  - x^^)] 
and 
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' W - ' = 2.  3 N. 

This  u vector  then  determines  the  weighting  factors  a, 

a,  = [X.(x^)]  (wj/w,)  e i = 2.  3 N.  aj  = 1. 

At  first  thought,  doubling  the  information  requirements  of  the  proxy  seems  to  add  a 
considerable  assessment  burden.  However,  the  extra  parameters  are  already  available 
since  tradeoffs  are  assessed  at  each  iteration.  Instead  of  throwing  away  the  past 
information  as  in  Boyd's  procedure,  my  method  uses  tradeoffs  at  the  current  and 
previous  points.  Therefore,  the  sum-of-exponentials  proxy  has  no  additional 
information  cost 

This  choice  of  proxy  function  illustrates  an  important  optimization  concept.  In  any 
search  procedure,  we  want  to  use  the  most  efficient  method;  we  must  weigh  our  desire  to 
use  the  best  model  against  its  information  requirements.  The  model  selection  itself  is  an 
optimization  problem.  In  our  multi-attribute  decision,  normative  assumptions  motivate 
the  sum-of-exponentials  proxy.  It  is  a much  better  model  than  Boyd's  linear 
approximation,  but  requires  twice  as  much  information.  The  situation  here  is  rather 
unique,  however,  since  the  extra  information  is  provided  at  no  extra  cost.  The  algorithm 
generates  all  the  required  information,  so  we  have  no  delicate  balance  to  consider.  The 
higher-order  model  is  superior  since  it  makes  more  efficient  use  of  the  information 
already  available. 

An  analogy  can  be  drawn  to  the  minimization  of  a polynomial  using  its  Taylor 
series  expansion.  The  method  of  steepest  descent  requires  just  the  gradient,  but  achieves 
only  first-order  convergence.  Newton's  method  requires  the  entire  Hessian  matrix,  but 
achieves  at  least  second-order  convergence.  The  proxy  iteration  algorithm  therefore 
plays  the  same  role  relative  to  the  Frank-Wolfe  procedure  that  Newton's  method  plays 
relative  to  steepest  descent. 

All  these  search  procedures  use  models  of  the  true  objective  at  each  stage.  In  our 
decision  problem,  we  use  local  proxies  to  avoid  the  restrictions  of  global  preference 
modeling.  In  other  optimization  problems,  we  use  Taylor  expansion  models  to  avoid  the 
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more  difficult  direct  optimization  of  the  true  objective.  In  any  case,  if  the  model  were 
identical  to  the  true  objective,  the  algorithm  would  converge  in  one  step.  Since  the 
model  is  only  a proxy  for  the  true  objective,  each  optimization  yields  only  a trial 
solution  for  the  true  optimum;  the  better  the  proxy  function  fits  the  true  objective,  the 
faster  the  trial  sequence  reaches  the  optimum. 

3.3  Optimizing  the  New  Proxy 

Nonlinear  optimization  problems  are  generally  more  difficult  than  linear 
optimization  problems.  Since  we  replace  Boyd's  linear  pseudo-objective  with  the 
nonlinear  proxy,  we  would  expect  each  maximization  to  be  more  complicated.  In  the 
case  of  linear  constraints,  each  iteration  would  require  a nonlinear  rather  than  a linear 
program. 

The  sum-of-exponentials,  sum-of-powers,  and  Cobb-Douglass  preference  functions, 
however,  all  have  a special  mathematical  structure  that  simplifies  the  task.  All  are 
concave  and  separable,  sc  the  maximization  with  linear  constraints,  when  converted  to  a 
minimization  in  standard  form,  becomes  a convex  separable  problem.  For  the 
sum-of-exponentials  proxy,  the  problem  is  written  below: 

Maximize  a,e  9 Ax  < b 

or  equivalently. 

Minimize  ^ a^e  ^ Ax  < b. 

Convex  separable  techniques,  using  a series  of  linear  programs,  make  this  problem  . 

relatively  easy  to  solve.  Most  multi-attribute  decisions  can  be  modeled  effectively  with  a | 

few  attributes:  a large  number  of  attributes  would  make  a problem  unmanageable.  For  i 

problems  of  small  dimension,  the  new  proxy  with  the  convex  programming  algorithm  so 
drastically  decreases  the  number  of  iterations  that  even  though  each  iteration  takes  a 
little  longer,  the  total  computer  time  is  considerably  reduced.  This  special  mathematical 
structure  and  the  algorithms  designed  to  exploit  it  eliminate  any  computational  burden 
associated  with  the  concave  proxy. 
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The  multi-attribute  decision  with  one  linear  budget  constraint  is  a frequently 
occurring  problem.  Dual  methods  provide  a closed-form  solution  to  this  special  case. 
Before  using  the  primal-dual  technique  [25].[26],  I motivate  the  duality  concept  with 
the  intuitive  example  in  Figure  3.2. 

The  drawing  represents  the  top  management  structure  of  a corporation.  For  the  i‘^ 
division,  x,  is  the  quantity  of  resource  used  and  f|(x,)  is  the  profit  from  operating  at 
level  Xj.  The  president  wants  to  maximize  total  corporate  profits  f|(Xj)  subject  to 
the  budget  constraint  2^  X|  < b.  He  is  trying  to  allocate  resources  as  efficiently  as 

possible.  He  may  find  the  optimal  (Xj*,  himself  and  give  orders  to  his 

vice-presidents;  this  centralized  style  of  management  corresponds  to  primal  methods  of 
optimization.  In  this  problem,  primal  methods  lead  to  N+1  simultaneous  nonlinear 
equations  in  N+1  variables,  a very  difficult  system  to  solve.  Instead  of  solving  the 
problem  himself,  the  president  may  decide  to  let  his  controller  set  a shadow  price  /x  for 
- the  resource.  The  vice-presidents  would  then  choose  their  divisional  operating  levels  and 
would  be  charged  ft.  dollars  for  each  unit  of  resource  they  use.  Since  the  corporate 
profit  function  is  concave  separable,  each  vice  president  could  independently  choose  the 
Xj  that  maximizes  his  division's  profit  f,(X|)  - /ix^. 

In  this  decentralized  scheme,  the  controller  is  allocating  corporate  earnings  to  the 
income  statements  of  the  individual  divisions.  He  tries  to  minimize  the  divisions' 
credits,  making  the  earning  power  appear  to  be  a corporate  rather  than  divisional 
phenomenom.  He  allocates  total  credits  /ib  by  maximizing  his  share,  /ib  - 2,  E,(/i), 
with  respect  to  fi,  where  E|(/x)  = f,[x,*(fi)]  - fiX|*(;i).  the  earnings  of  division  i 
parametrized  in  terms  of  the  shadow  price  /i.  This  technique  requires  N+1 
one-dimensional  suboptimizations,  each  of  which  can  be  solved  by  elementary  calculus. 
The  Convex  Duality  Theorem  (s^e  Appendix  C)  guarantees  that  dual  methods,  using  this 
competitive  situation  between  the  controller  and  the  vice-presidents,  give  the  same 
solution  the  president  would  have  reached  using  primal  methods  directly.  The  advantage 
of  the  dual  technique  is  its  computational  simplicity.  The  closed-form  solution  for  the 
sum-of-exponentials  function  with  one  linear  budget  constraint  is  derived  as  follows: 

* 
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Xj : 

resource  level 

f.fx.): 

contribution  to 

corporate  profits 
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The  President’s  Problem:  Maximize  2 (x, ) subject  to2  x,  < b 

V 


Figure  3.2  Decentralization:  An  Intuitive  View  of  Duality 
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Minimize^  f(x)  = 

subject  to  2,  C|X|  < b,  where  b > 0 and  a,,  > 0 Vi. 

For  all  X € X,  the  Hessian  of  the  objective  F(x)  is  positive  definite  and  the 
constraint  Hessian  G(x)  = [0],  Therefore,  the  Lagrangian  L(x)  is  positive  definite 
everywhere  so  any  local  minimum  is  a global  mimimum.  Using  the  Convex  Duality 
Theorem  (Appendix  C),  the  dual  function  is 

V(/^)  = min^  [2,  + /x((2,  c.x,)  - b)]. 

Separability  implies  each  x,  can  be  optimized  independently, 

(ajC  + /aCjXj).  p > 0. 

Setting  the  first  derivative  equal  to  zero  yields 

X,  = (-l/Wj)  In  [(/ic,)/(«jai)].  (3.1) 

Substituting  X|  into  and  simplifying  yields 

‘P(P)  = P [2>  * (c/«,)  In  [(w,a,)/(;iCj)])  - b]. 

Maximizing  ip(n)  unconstrained,  <)p'(p)  = 0 at 

^ [(2.  [(wia,)/c,])  - b]  / (2, 

Substituting  /i*  in  (3.1)  yields  the  closed-form  solution 
X,*  « [b  + (in  [(ui,a,)/c,]  2,  c/«,)  - (2,  ''  0-2) 

The  nonnegativity  requirements  were  suppressed  in  the  derivation  above,  so  the 
solution  X*  must  now  be  checked  for  violations.  A technique  illustrating  the  central 
theme  of  duality  can  be  used  to  insure  nonnegativity.  From  equation  (3.1),  we  can 
derive  the  parametric  shadow  price 

ft  = (w,a,e  "'*')  / c,. 
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This  equation  shows  that  n is  the  ratio  of  the  marginal  revenue  of  each  division  to  its 
marginal  cost.  Since  w,  a,  c > 0, 

>1  = (w,ai)/c,  "5=^  Xj  = 0; 

fi  < (w,a,)/c,  X,  > 0; 

ft  > (wia,)/c,  5^  Xj  < 0. 

The  convex  dual  technique  finds  the  x*  at  which  the  shadow  price  is  equal  for  all 
divisions.  Figure  3.3  shows,  in  the  context  of  the  corporate  example,  that  marginal 
revenue  w^aje  equals  marginal  cost  Cj  at  x,  > 0;  at  this  level  of  x,.  the  shadow 
price  /ij  is  less  than  (W|a,)/Cj.  For  marginal  cost  C2,  the  marginal  revenue  curve 
intersects  marginal  cost  Cj  at  X|  < 0;  here  the  shadow  price  greater  than 

(a)|a,)/c2.  If  equation  (3.2)  prescribes  any  x^  < 0,  hence  < /t,  the  j*’’  vice 

president  would  reset  Xj  to  zero  to  eliminate  his  division’s  loss  (all  costs  are  variable 
costs).  The  entire  allocation  would  then  be  solved  again,  with  the  added  restriction  x^  = 
0. 

To  handle  nonnegativity  violations  efficiently,  we  rank  the  quantities  (W|ai)/Cj  in 
ascending  order.  All  are  positive  since  w.  a.  c > 0.  If  any  violations  occur,  the  x^ 
corresponding  to  the  lowest  (wjaj)/Cj  is  set  equal  to  zero.  The  shadow  price  ft  is  then 
recalculated  excluding  this  x^.  If  the  next  solution  is  nonnegative,  it  is  optimal;  if  not, 
the  Xj  corresponding  to  the  next  lowest  (wjaj)/Cj  is  set  to  zero  and  the  routine  is 
repeated.  Once  an  x^  is  eliminated,  it  can  never  be  made  positive  in  a subsequent 
allocation  since  its  associated  (w^a^l/Cj  will  always  be  less  than  the  subsequent  /I's.  In 
most  practical  problems,  intelligent  modeling  of  the  attributes  will  prevent  negative 
solutions,  but  this  dual  technique  will  resolve  any  violations  should  they  occur. 

Problems  with  two  or  more  constraints  do  not  have  closed-form  solutions  even  with 
dual  methods  since  the  Lagrange  multipliers  themselves  are  entangled  in  simultaneous 
equations.  If  the  constraints  are  linear,  these  problems  can  be  solved  easily  with  convex 
programming  routines.  Nonlinear  constraints,  however,  with  either  Boyd’s  linear 
pseudo-objective  or  my  concave  proxy,  require  more  complex  nonlinear  methods. 

Having  solved  the  optimization  at  each  step,  we  conclude  that  just  as  the 
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Figure  3.3  Duality:  Marginal  Revenue  vs.  Marginal  Cost 
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higher-order  proxy  had  no  additional  information  cost,  it  adds  no  significant 
computational  cost  either.  The  maximization  procedures  for  the  Cobb-Douglass  and 
sum-of-powers  proxies  are  very  similar  since  these  functions  are  also  concave  separable. 
The  special  form  of  the  Cobb-Douglass  function  allows  a closed-form  solution  even 
with  a quadratic  constraint.  The  derivations  with  these  functions  follow  the  same  steps, 
so  they  are  not  reproduced  here,  but  the  closed-form  solutions  will  be  used  in  Chapter 
IV. 

3.4  Global  Convergence  with  the  New  Proxy 

The  new  proxy  accelerates  the  successive  approximation  algorithm,  but  does  not 
ensure  convergence.  To  guarantee  that  the  algorithm  will  always  converge,  we  must 
verify  that  each  iteration  makes  a sufficient  improvement.  Since  we  can  only  maximize 
the  true  objective  by  proxy,  we  must  ask  the  decision  maker  after  each  maximization  if 
the  new  point  is  preferred  to  the  old  one.  The  decision  maker  cannot  specify  the  entire 
true  objective,  but  we  assume  he  can  answer  simple  choice  questions  comparing  two 
prospects.  If  he  prefers  the  new  point  to  the  old,  the  next  iteration  may  begin.  If  the 
new  point  is  worse,  something  must  be  done  to  find  a better  one. 

At  the  k'*’  iteration,  the  proxy  is  fit  from  vectors  collinear  with  the  gradients  of  the 
true  objective  at  x*'  and  x*'"’.  The  following  theorem  shows  that  at  each  iteration,  the 
gradient  of  the  true  objective  has  a positive  component  along  the  direction  of  search 
obtained  by  maximizing  the  proxy.  Optimizing  the  proxy,  therefore,  guarantees  a search 
along  a direction  of  genuine  improvement. 

Theorem  3.2.  Given  the  decision  problem  (2.1),  let  x'‘*’  = max^  p[x  | x^,  x*^'*] 
for  all  X € X(D).  Then  VV(x'‘)(x^*'  - x'')  > 0. 

Proof;  At  any  x'‘  generated  by  the  algorithm,  VV(x'‘)  defines  the  tangent  plane 
H(x  1 x^)  = {x  1 VV(x'‘)(x  - x'‘)  = 0}  and  the  halfspace  H'^ix  | VV(x^)(x  - x^)  > 0). 
By  construction,  Vp(x'‘)  = cVV(x'‘)  for  some  positive  scalar  c,  so  H(x  | x^)  is  also  the 
tangent  plane  to  Vp(x)  at  x^  The  indifference  curves  of  the  proxy  are  strictly  convex. 
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so  if  p(i;)  > p(x)  for  any  y € X,  then  y € | x*").  But  x*'*'  maximizes  p[x  | x'‘, 

x'‘**],  so  p(x'‘*’)  > p(x*')-  Therefore  x*^**  ^ H'*’(x  | x*‘)  and  VV(x'‘)(x'‘*^  - x'‘)  > 0. 

Q.E.D. 

Figure  3.4  illustrates  the  concept  of  maximization  by  proxy.  In  both  the  top  and 
bottom  diagrams,  x*"*'  'S  the  point  generated  by  the  k'^  iteration.  The  curve  labeled 
proxy  is  the  slice  of  the  proxy  function  along  the  line  determined  by  x'‘  and  x'‘*^  The 
quantity  [PCX'**')  * p(xh].  measuring  the  change  in  the  proxy,  must  be  positive  (or 
zero  if  = 1**)  since  x^**  maximizes  the  proxy  [p(x)  I x'‘,  x*‘  ']  subject  to  the 
constraints.  This  quantity,  represented  in  Figure  3.4  by  the  arrow  P,  is  used  to  predict 
the  improvement  in  the  true  objective.  Pretending  for  the  moment  the  true  objective  is 
known,  we  draw  the  corresponding  slice  of  V along  the  same  direction.  The  arrow  A 
measures  the  actual  change  in  V at  x*^**.  This  quantity,  [V(x^*')  - V(x'‘)],  can  be 
negative  or  positive.  If  the  proxy  is  a good  model  of  the  true  objective,  as  drawn  in  the 
top  diagram,  the  actual  change  is  positive  and  the  iteration  makes  an  improvement.  If 
the  proxy  is  a poor  model,  as  drawn  in  the  bottom  diagram,  A and  P point  in 
opposite  directions.  In  this  case,  x'‘**  overshoots  the  true  maximum  far  enough  so 
V(x^*’)  < V(x'‘),  Since  movement  in  the  x^*’  direction  guarantees  improvement  if 
small  enough  a step  is  taken,  there  must  be  a point  along  the  line  determined 

by  x*"  but  somewhere  in  between,  such  that  V(x<'‘*')’)  > V(x'').  An 

optimization  procedure  called  relaxation  is  used  to  find  such  a point  x^*'*'*.  For  some 
a,  0 < a < 1, 

x(k+l)  - + (l-a)x'‘. 

The  parameter  a is  decreased  until  improvement  is  achieved.  Figure  3.5  shows  the 
actual  change  A’  , measured  at  has  the  same  direction  as  predicted 

improvement  P.  Since  the  proxy  always  specifies  a feasible  direction  of  improvement 
(except  at  the  solution  where  x*'*’  = x'‘),  the  relaxation  procedure  is  guaranteed  to 
generate  the  required 

At  this  point  we  must  carefully  examine  the  theoretical  differences  between  the 


Figure  3.4  Docs  Iteration  Improve  or  Worsen  True  Objective? 


relax  and  improve 


Figure  3.5  Relaxation  Technique 
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modified  Frank-Wolfe  algorithm  and  its  analog  with  the  sum-of-exponentials  proxy. 
We  begin  our  analysis  assuming  V is  known;  then  we  extend  our  results  to  the 
interactive  case  where  V is  unknown. 

The  original  Frank-Wolfe  algorithm  for  maximizing  a concave  function  V over  a 
convex  set  X is  given  below; 

Step  0.  Choose  an  initial  point  x'  £ X.  Let  k » 1. 

Step  1.  Determine  an  optimal  solution  y'‘  to  the  direction-finding  problem; 

max^  VV(x'‘)y  over  all  y € X. 

Let  m'‘  = y'‘  - x'‘. 

Step  2.  Determine  an  optimal  solution  t*^  to  the  step-size  problem 

max^  V(x'‘  + tm'‘)  for  0 < t < 1. 

Let  x*'"’*  = x'‘  + t'‘m'‘.  Stop  if  xi^*'  = x^  If  not.  let  k = k+1  and  return 
to  step  1. 

Step  1 works  directly  on  the  optimality  conditions  for  the  true  objective  V.  It 
finds  the  best  feasible  direction  of  improvement  for  an  infinitesimal  step.  In  step  2.  an 
exact  line  search  is  performed,  locating  the  maximum  of  V in  the  m*'  direction.  If  t^ 
= 0,  no  feasible  direction  of  improvement  exists,  so  x*^  is  the  optimal  solution.  Wolfe 
[34]  has  proven  global  convergence  of  this  algorithm. 

In  the  modified  Frank-Wolfe  algorithm,  an  inexact  line  search  is  used  at  step  2. 
Even  when  an  exact  line  search  is  performed,  it  yields  only  a trial  solution  to  the 
original  problem.  Since  exact  line  search  is  generally  a costly  procedure,  the  inexact  line 
search  would  be  advantageous  if  convergence  could  be  guaranteed. 

Several  papers  in  the  optimization  literature  investigate  convergence  of  feasible 
direction  algorithms.  Goldstein  [13]  devised  a technique  ensuring  finite  improvement  at 
each  stage  of  the  modified  Frank-Wolfe  algorithm.  At  any  iteration,  the  actual 
improvement  may  be  very  small  even  if  the  predicted  improvement  is  large.  The  true 
objective,  however,  must  improve  by  a sufficient  amount  to  prevent  jamming  at  a 
non-optimal  point  Oamming  occurs  when  consecutive  iterations  stall  in  a non-optimal 
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level  set).  The  Goldstein  test  simultaneously  tests  for  improvement  and  prevents 
jamming  by  requiring  the  ratio  of  the  actual  change  to  the  predicted  improvement  to 
exceed  a fixed  positive  t at  each  iteration.  For  the  linear  approximation  algorithm,  the 
test  is  written  mathematically: 

If  for  some  fixed  e > 0, 

([V(x^*l)  - VCxk)]  / [vV(x'‘)(xk*J  - x»)])  > e 

then  x'‘*>  is  an  improvement.  If  not,  let  = ax'^*’  + (l-a)x^  0 < a < 1,  and 

continue  decreasing  a until  the  criterion  is  satisfied.  A numerical  value  for  i must  be 

set  as  a uniform  lower  bound  on  the  test  ratio. 

Armijo  [13]  designed  a very  efficient  stepsize  scheme  to  accelerate  the  relaxation 
procedure.  As  the  trial  sequence  improves,  the  technique  specifies  smaller  and  smaller 
steps; 

x(k*l)’  ^ (l-a'’)xk,  0 < o < 1. 

where  n increases  by  one  after  each  relaxation.  The  Goldstein  test  coupled  with  Armijo 
relaxation  is  a very  efficient  procedure  for  insuring  improvement  at  each  iteration. 
Garcia-Palomares  [13]  proved  global  convergence  of  the  Goldstein  and  Armijo 

procedures  in  linear  approximation  feasible  direction  algorithms.  His  theorem  and  a 
formal  statement  of  the  Goldstein  and  Armijo  procedures  are  included  in  Appendix  D; 
the  main  concepts  of  his  proof  are  outlined  below; 

A.  Verify  that  the  algorithm  generates  a bounded  feasible  direction  of 
improvement  at  each  iteration. 

B.  Use  the  Armijo  relaxation  procedure  to  show  an  improvement  satisfying 

( the  Goldstein  test  can  be  made  at  each  stage. 

C.  Use  continuity  and  boundedness  to  prove  the  limit  of  the  predicted 
improvement  is  zero.  This  step  insures  the  algorithm  will  converge  to  some  x^. 

j D.  Show  that  any  accumulation  point  xt  obeys  the  optimality  criterion 

j VV(x+)m+  < 0 where  m'^  is  any  feasible  direction  at  xt.  Thus  the  algorithm  must 
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converge,  and  any  point  to  which  it  converges  must  be  the  true  optimum. 

Now  we  examine  the  convergence  of  the  algorithm  with  the  sum-of-exponentials 
proxy.  At  each  iteration,  maximization  of  the  sum-of-exponentials  proxy  generates  a 
direction  I*'  different  than  the  m'‘  obtained  by  the  Frank-Wolfe  procedure.  The 
direction  l'‘  is,  in  a sense,  a global  direction  of  improvement  in  contrast  to  the 
Frank-Wolfe  local  direction  of  improvement.  This  global  direction  is  a better  proxy  for 
the  path  to  the  true  optimum.  To  prove  infinite  convergence,  however,  we  must  still 
show  that  VV(x^)m^  <0  at  any  accumulation  point  x^  [13],[32X  The 
Frank-Wolfe  algorithm  works  directly  on  this  condition,  but  the  sum-of-exponentials 
algorithm  maximizes  p(x)  instead. 

We  may  take  either  of  two  approaches  to  ensure  global  convergence.  The  first  is  an 
extension  of  the  Goldstein-Armijo  procedure.  It  requires  an  existence  proof  of  a fixed 
6 > 0,  such  that  for  all  k, 

([V(xSE)  - Vfx^)]  / [VV(x^)(x'^w  . 5 

where  x^^  and  x^'^'  are  the  points  generated  by  the  sum-of-exponentials  and  linear 
proxies,  respectively  (accounting  for  relaxation  steps  if  necessary).  This  existence  proof 
would  require  a global  restriction  on  the  curvature  of  V.  Our  proxy  approach,  however, 
was  designed  to  avoid  global  preference  assumptions  of  this  type. 

A second  approach,  motivated  by  the  Spacer  Step  Theorem  (see  Appendix  C).  allows 
us  to  ensure  convergence  with  no  additional  assumptions.  In  this  approach,  a modified 
Frank-Wolfe  step,  using  a linear  proxy,  is  inserted  periodically  between 
sum-of-exponential  iterations.  It  is  called  a spacer  step  since  it  separates  disjoint 
portions  of  the  complex  sequence.  The  Spacer  Step  Theorem,  when  applied  to  our 
decision  problem,  guarantees  that  if 

a)  the  spacer  step  is  a step  of  an  algorithm  known  to  converge,  and 

b)  all  other  steps  of  the  process  do  not  worsen  the  objective, 

then 

c)  the  entire  complex  sequence  converges. 
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Convergence  of  the  modified  Frank-Wolfe  algorithm  (MFW)  guarantees  (a)  is  true,  and 
Theorem  3.2  gurarantees  (b)  is  true.  Therefore,  (c)  implies  the  proxy  iteration  algorithm 
with  the  MFW  spacer  step  is  globally  convergent 

By  making  assumptions  about  the  tradeoff  assessment  error,  we  can  also  prove 
global  convergence  of  the  interactive  algorithm.  These  results  are  discussed  together  with 
consistency  tests  in  section  3.5. 

Stopping  Rule 

When  interacting  with  a decision  maker,  we  can  assess  tradeoffs  at  only  a small 
number  of  points,  so  the  number  of  iterations  is  limited.  Instead  of  examining  the 
infinite  convergence  of  the  sequence  {x'‘},  we  need  to  calculate  a bound  on  the  error 
V(x*)  - V(x'‘)  from  slopping  at  the  k'*'  iteration.  The  strict  concavity  of  V implies 

V(x»)  - Vlx*^)  < VV(x'‘)(x*  - x*^). 

Since  (x*  - x'^)  is  one  of  many  feasible  directions  at  x'‘, 

VV(x'‘)(x*  - x*')  < max^  ^V(x'‘)y 

where  y is  any  feasible  direction  at  x^  By  construction,  OV(x‘')/axj)  A(x'‘)^  = 

VV(x'‘),  so  we  have  an  upper  bound  on  the  error  term: 

V(x*)  - V(xk)  < [aV(x^)/ax,]\(x^)Ty. 

Axiom  2.2,  implying  the  decision  maker's  willingness  to  make  tradeoffs,  guarantees  that 
3V(x)/3xj  is  bounded  above  for  all  x C X(D).  Assuming  an  upper  bound  k on  this 
scaling  factor,  we  can  calculate  an  upper  bound  on  the  error  from  stopping  at  x'‘:  v 

V(x*)  - V(x^)  < A[A(x'‘)T]y  (3.3)  ' 

The  optimal  y maximizes  the  linear  approximation  of  V,  fit  at  x^  subject  to  the 
constraints.  It  is  therefore  the  same  point  that  would  be  found  by  a step  of  the  modified 
Frank-Wolfe  procedure.  Since  the  objective  is  linear,  the  percentage 

I 

additional  computation  for  the  error  bound  at  each  iteration  is  small.  No  additional  ■ 

computation  is  required  at  the  MFW  spacer  steps. 

L 
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As  a slopping  rule  for  the  algorithm,  we  set  a tolerance  level  c for  the  error  term 
V(x*)  - V(x‘').  If  at  iteration  k,  the  error  bound  in  equation  (3.3)  is  less  than  c,  then 
x*‘  is  an  acceptable  solution.  If  the  error  bound  exceeds  the  tolerance,  the  iterative 
process  continues.  This  stopping  rule  is  the  final  step  of  the  algorithm. 

For  the  special  case  with  one  linear  constraint,  the  solution  to 

max^  subject  to  c^x  < b,  x > Q 

is  easily  found  since  the  entire  b is  allocated  to  the  Xj  with  the  highest  simplex 
multiplier.  The  optimal  solution  for  this  subproblem  is 

yj  = b/Cj,  y|  = 0.  i ^ j,  where  j = max.  A/Cj. 

The  upper  bound  on  the  error  after  the  k'**  iteration  is  K(b/Cj)Aj(x'‘). 

Thus  far  I have  developed  piece  by  piece  the  procedures  for  assessing  tradeoffs, 
fitting  and  maximizing  the  proxy,  insuring  convergence,  and  testing  trial  solutions. 
Figure  3.6  is  a flow  chart  linking  together  these  individual  steps  to  form  the  proxy 
iteration  algorithm.  After  the  constraints  are  modeled  and  the  tolerance  is  established, 
the  analyst  assesses  the  decision  maker's  tradeoffs.  For  the  sum-of-exponentials  proxy, 
2N-1  tradeoffs  are  required.  On  the  first  iteration,  N-1  tradeoffs  at  each  of  two 
arbitrarily  chosen  points  x'  and  x^  plus  one  additional  tradeoff  at  a third  point 
nearby  provide  the  necessary  information.  The  sum-of-exponentials  proxy  is  then  fit 
from  these  assessments  and  maximized  with  the  techniques  of  section  3.3.  The  decision 
maker  must  then  compare  the  new  maximum  x’  with  the  previous  point.  If  x^  > 2S,\ 
relaxation  methods  are  used  until  x^  > x’.  If  the  upper  bound  on  [V(x*)  - V(x*‘)] 
passes  the  tolerance  test,  x^  is  the  solution.  If  it  fails,  the  next  iteration  begins,  but 
requires  only  N-1  new  tradeoff  assessments  since  N tradeoffs  at  previous  points  are 
used  again.  The  iterations  continue,  using  the  MFW  spacer  step  periodically,  until  the 
error  bound  at  some  x*'  passes  the  tolerance  test;  this  x^  is  an  acceptable  solution. 
The  next  x*‘*'  could  be  found  with  no  additional  information  since  "'‘'•s 

required  for  the  tolerance  test.  If  x^*'  ^ x*"-  would  further  reduce  the  error.  The 
tolerance  limit  can  be  adjusted  to  suit  the  requirements  of  any  particular  problem;  the 
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smaller  the  tolerance,  the  larger  the  number  of  iterations  required. 

Before  interacting  with  a real  decision  maker,  I test  the  new  algorithm  with  a few 
simple  examples  in  which  the  computer  plays  the  role  of  the  decision  maker.  Given  a 
deterministic  preference  function,  the  computer  evaluates  the  tradeoffs,  checks  for 
improvement,  and  performs  the  tolerance  test.  There  is  no  assessment  error  when  the 
computer  plays  the  decision-making  role.  Figure  3.7  presents  this  entire  computerized 
procedure. 

Although  this  dissertation  is  written  from  a decision  analytic  perspective,  it  may  be 
of  some  interest  in  optimization  as  well.  Independent  of  its  decision  analysis 
interpretation,  the  algorithm  in  Figure  3.7  is  itself  an  optimization  technique.  Most 
feasible  direction  methods  use  Taylor  series  approximations,  but  this  algorithm  uses 
sum-of-exponentials,  sum-of-powers,  and  Cobb-Douglass  proxies.  These  functions 
require  less  information  since  they  are  separable,  implying  all  off-diagonal  elements  of 
the  inverted  Hessian  matrix  are  zero.  They  are  good  proxies  in  our  decision  problem 
because  of  their  normative  motivation.  Perhaps  they  can  be  utilized  effectively  in  other 
search  algorithms  as  well.  I have  not  found  any  iterative  search  algorithms  in  the 
optimization  literature  using  these  functions  as  local  approximations. 

Returning  now  to  our  decision  analysis  context,  we  let  the  computer  play  the  role  of 
the  decision  maker.  In  the  following  two  examples  shown  in  Figures  3.8  and  3.9,  the 
true  preference  functions  are  known  and  closed-form  solutions  exist.  These  simple 
examples  are  run  to  illustrate  the  concepts  of  this  chapter.  A different  tolerance  test 
may  be  used  since  the  solutions  are  known  in  advance.  In  Figure  3.7,  the  iterations  stop 
when  each  component  of  x'‘  is  within  1%  of  the  corresponding  component  of  x*: 

Stop  at  x'‘  if  ■ Xj*‘|  / Xj*^  < 1%  for  all  j. 

Figure  3.8  shows  the  interactive  nature  of  the  algorithm.  The  tradeoff  vector  and  new 
maximum  at  each  iteration  are  listed  until  the  optimum  is  reached.  The 
sum-of-exponentials  proxy  is  used  in  this  trivial  example.  Since  the  true  objective  is 
also  a sum-of-exponentials  function,  the  algorithm  converges  in  two  steps:  x^ 
maximizes  the  first  proxy,  and  x^  = x-’  maximizes  the  second  proxy,  identical  to  the 
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4e  - 2e“®*'^^3 


subject  to  + 2x2  ■*■  ^*^3  i ^00 


EttTER  NU/SEE  OF  ATTRIBUTES 

□ : 

3 

SELECT  DETERt-ilNISTIC  PREFERENCE  FUNCTION.  TYPE  TdE  NUfSER  PRECEDING  CHOICE 
1 = SUM-OF-EXPONENTIALS  i 2 = COBB  - DOUGLASS  x 3 = POSYNOtaAL-,  4 = OTHER 
□ : 

1 


ENTER  PARA/iETERS  OF  DETERMINISTIC  PREFERENCE  FUNCTION  CHOSEN  ABOVE. 

ENTER  WEIGHTING  FACTORS  A{1),  A(2) AUl) 

□ : 

14  2 


ENTER  EXPONENTS  0/^fEGA(l),  0MEGA(2) OMEGA (N) 

□ : 

0.1  0.2  0.4 


ENTER  LINEAR  COST  CONSTRAINT  COEFFICIENTS  C(D,C(2) C(//) 

□ : 

12  4 

ENTER  CONSTRAINT  MAXIMUM  D 

0: 

100 


ENTER  TOLERANCE  OF  SOLUTION  AS  THE  PERCENTAGE  ERROR  IN  EACH  COtPONENT 

□ : 

1 

ENTER  COORDINATES  OF  INITIAL  POINT  Jr(l) 

□: 

20  10  15 


ENTER  COORDINATES  OF  SECOND  POINT  X{2) 

□ : 

10  40  2.5 


Figure  3.8a.  Ebcairple  vd.th  Sum-of-Elxpcjnentials  Objective 
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SUM  OF  EXPONENTIALS  FITTED  WCALLY  FROM  X{2)  AND  X{\)  YIELDS 


NEW  MAniJUM  X(3)  26.40  20.13  8.33 

PREDICTED  Il-PROVEMENT  IS  0.89094 

ACTUAL  IMPROVEMENT  IS  0.89094 

ITERATION  IMPROVED  OBJECTIVE.  USE  NEW  MAXIWM  IN  NEXT  ITERATION. 

SUM  OF  EXPONENTIATE  FITTED  LOCALLY  FROM  X(3)  ATJD  X(2)  YIELDS 

NEW  tTAXIMUM  X(‘t)  26.40  20.13  8.33 

PREDICTED  IMPROVEMENT  IS  0.00000 

ACTUAL  IMPROVEMENT  IS  0.00000 


OPTIMUM  HAS  BEEN  REACHED,  OPTIMAL  SOLUTION  IS: 

26.40  20.13  8.33 


LAGRANGE  t-IULTIPLIER  AT  SOLUTION  IS  0.00713 


SEQUENCE  OF  ALCOPJTHM 


POINT 

TRADE-OFFS 

m-fEFAIRE 

20.00 

10.00 

15.00 

1.00 

8.00 

0.15 

10.00 

40.00 

2.50 

1.00 

0.01 

8.00 

'1.10498 

26.40 

20.13 

8,33 

1.00 

2.00 

4.00 

'0.21404 

26.40 

20.13 

8.33 

"0.21404 

Iterations  with  Sum-of-Exponenticils  Ebcanple 
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Figure  3.8b 


0.2  0.5  0.3 

Maximize  x,  x^  x, 

12  3 

subject  to  x^  + 2X2  + 4X2  S 100 


EIITEH  JWmER  OF  ATTRIBUTES 
□ : 

3 

SELECT  DETERf-IINISTIC  FREFEREUCE  FUNCTION.  TYPE  WE  NUt-SER  PRECEDING  CHOICE 
1 = SUM-OF-EXPONENTIALS  \ 2 = COS B-DOU GLASS 3 = POSYNOmAL-,  4 = OTHER 
□ : 

2 


ENTER  PARAI^ETERS  OF  DETERMINISTIC  PREFERENCE  FUNCTION  CHOSEN  ABOVE. 

ENTER  EXPONENTS  BETA{  1 ) , BETA  ( 2 ) BETA{N) 

0: 

0.2  0.5  0.3 

ENTER  LINEAR  COST  CONSTRAINT  COEFFICIENTS  C(1),C(2) C^N) 

□: 

12  4 


ENTER  CONSTRAINT  MAXIMUM  D 
0: 

100 


ENTER  TOLERANCE  OF  SOLUTION  AS  THE  PERCENTAGE  ERROR  IN  EACH  COMPONENT 
0; 

1 

ENTER  COORDINATES  OF  INITIAL  POINT  ;ir(l) 

0; 

20  10  15 


ENTER  COORDINATES  OF  SECOND  POINT  X{7) 
D: 

10  40  2.5 


Figure  3.9a  Exanple  with  Ctobb-Douglass  Objective 
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SU:-1  OF  EXPO^lENTIALS  FITTED  LOCALLY  FBOM  X{7)  ADD  /(I)  YIELDS 


NElyf  mmiUM  X13)  16.29  24.74  8.56 

PREDICTED  IHPROVEmiT  IS  0.54992 

ACTUAL  UIPROVEmNT  IS  3.35486 

ITERATION  IMPROVED  OBJECTIVE.  USE  tlEl-!  MAXIMUM  IN  NEXT  ITERATION. 

SUM  OF  EXPONENTIAIB  FITTED  LOCALLY  FROM  X{2)  AND  X(2)  YIELDS 

NEl/ MAXIMUM  X{*i)  19.69  24.65  7.75 

PREDICTED  IMPROVEtlENT  IS  0.01058 

ACTUAL  IMPROVEt-fENT  IS  0.10683 

ITERATION  IMPROVED  OBJECTIVE.  USE  NEW  MAXIMUM  IN  NEXT  ITERATION. 

SUM  OF  EXPONENTIALS  FITTED  LOCALLY  FROM  Ar(4)  AND  A’O)  YIELDS 

NEW  MAUMUM  X{5)  20.19  24.71  7.59 

PREDICTED  IMPROVEMENT  IS  0.00028 

ACTUAL  IMPROVEMENT  IS  0.00288 

ITERATION  IMPROVED  OBJECTIVE.  USE  NEW  MAXIMUM  IN  NEXT  ITERATION. 

SUM  OF  EXPONENTIALS  FITTED  LOCALLY  FROM  X{b)  AND  /(4)  YIELDS 

NEW  MAXIMUM  X{S)  19.81  25.19  7.45 

PREDICTED  IMPROVEMENT  IS  0.00020 

ACTUAL  IMPROVEMENT  IS  0.00058 

ITERATION  IMPROVED  OBJECTIVE.  USE  NEW  MAXimM  IN  NEXT  ITERATION. 


SUM  OF  EXPONENTIALS  FITTED  LOCALLY  FROM  ;f(6)  AND  X{5)  YIELDS 

NB/  mUMUM  X{1)  20.00  25.00  7.50 

PREDICTED  Il^ROVEMENT  IS  0.00006 

ACTUAL  Il-fPROVEMENT  IS  0.00051 

ITERATION  IMPROVED  OBJECTIVE.  USE  NEW  MAXIMUM  IN  NEXT  ITERATION. 

OPTIMUM  HAS  BEEN  REACHED.  OPTIMAL  SOLUTION  IS: 

20.00  25.00  7.50 

Figure  3.9b  Iterations  vath  Ocjhb-Douglass  Exanple 


I 


LACRAUGE  WITIPUER  AT  SOLUTION  IS  0.01836 


SEQUENCE  OF  ALGORITHM 


POINT 

TRADE-OFFS 

NUMERAIRE 

20.00 

10.00 

15.00 

1.00 

5.00 

2.00 

10.00 

40.00 

2.50 

1.00 

0.63 

6.00 

13.19508 

16.29 

24.74 

8.56 

1.00 

1.65 

2.86 

16.54994 

19.69 

24.65 

7.75 

1.00 

2.00 

3.81 

16.65677 

20.19 

24.71 

7.59 

1.00 

2.04 

3.99 

16.65965 

19.91 

25.19 

7.45 

1.00 

1.97 

3.99 

16.66023 

20.00 

25.00 

7.50 

16.66074 

Figure  3.9c  Solution  of  Oobb-Douglass  Example 
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first.  The  predicted  improvement  is  equal  to  the  actual  improvement  in  both  iterations; 
both  are  zero  at  the  solution.  Figure  3.8b  shows  the  optimal  solution  and  its  associated 
shadow  price  in  terms  of  the  numeraire  V(x). 

The  example  in  Figure  3.9  is  more  interesting  since  the  sum-of-exponentials  proxy 
is  used  to  optimize  a Cobb-Douglass  true  objective.  The  tolerance  test  is  passed  after 
five  iterations.  Each  new  maximum  is  an  improvement,  so  no  relaxations  are  requited. 
The  computer  solution  agrees  with  the  analytic  solution  to  seven  decimal  places.  In  these 
examples,  the  period  of  the  spacer  step  was  ten;  only  five  iterations  were  required,  so  the 
MFW  step  was  never  used.  In  the  next  chapter,  this  same  problem  is  solved  with  Boyd's 
algorithm  and  the  rates  of  convergence  are  compared. 

Nonconvex  Feasible  Regions 

In  proving  Theorem  3.1,  we  assumed  that  the  set  of  feasible  alternatives  was 
convex.  If  X(D)  is  not  convex,  the  maximization  of  p[x(d)  1 x(d*).  at  x(d*) 

is  not  a necessary  condition  for  the  maximization  of  V(x)  since  there  may  be  no 
hyperplane  that  separates  the  constraint  and  indifference  surfaces  at  x(d*)-  The 
condition  is  still  sufficient  if  X(D)  is  a connected  set,  but  there  is  no  guarantee  it  will 
obtain.  The  convexity  assumption  prohibits  gaps  and  discrete  outcomes.  In  the  language 
of  production  functions,  it  corresponds  to  constant  or  decreasing  returns  to  scale. 
Theorem  3.1  is  consistent  with  the  Arrow-Hurwicz  result  [2]  that  pricing  systems 
allocate  resources  efficiently  when  there  are  no  increasing  returns  to  scale.  Convexity  is 
a plausible  assumption,  applicable  to  many  multi-attribute  decisions  with  continuous 
outcome  variables. 

The  proxy  iteration  algorithm  may  still  be  useful  in  cases  where  X(D)  is  not 
convex.  If  X(D)  is  locally  convex  in  the  region  of  the  optimum,  maximization  of  the 
proxy  at  x(d*)  sI'H  3 necessary  and  sufficient  condition  for  the  maximization  of 
V(x).  If  the  constraint  has  a gap  near  the  optimum,  the  algorithm  may  be  applied  to  its 
convex  envelope.  If  the  gap  is  small,  sensitivity  analysis  may  reveal  the  true  optimum 
after  the  solution  to  this  artificial  problem  is  found.  Discrete  outcome  decisions  present 
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a greater  obstacle  since  small  steps  in  a direction  of  improvement  may  be  infeasible 
when  a large  step  is  allowed.  Consequently,  the  algorithm  could  stop  at  a non-optimal 
point.  Traditional  multi-attribute  procedures  should  be  used  for  discrete  problems  that 
have  no  meaningful  continuous  analogs. 

3.5  Consistency  of  Tradeoff  Assessments 

We  have  assumed  up  to  this  point  that  at  each  iteration  the  decision  maker  provides 
tradeoffs  consistent  with  a continuously  differentiable  deterministic  preference 
function.  This  assumption  has  allowed  us  to  develop  the  proxy  iteration  algorithm 
without  complications  arising  from  assessment  error.  In  this  section,  we  relax  the 
assumption  and  examine  techniques  for  checking  tradeoff  consistency.  We  also  examine 
the  effects  of  assessment  error  on  the  convergence  of  the  algorithm. 

I view  the  proxy  iteration  algorithm  as  a learning  process  for  the  decision  maker. 
At  each  iteration,  the  decision  maker  learns  more  about  his  underlying  preferences  as  he 
sees  the  implications  of  his  previous  assessments.  He  benefits  not  only  from  this 
feedback,  but  also  gets  more  practice  interpreting  and  responding  to  the  analyst's 
questions  at  each  step.  Viewing  the  interactive  algorithm  as  a learning  process,  we  can 
realistically  assume  that  the  tradeoff  assessments  become  better  representations  of  the 
decision  maker's  preferences  as  the  iterations  proceed.  Any  cumulative  consistency 
checking  scheme  should  embody  this  idea. 

I augment  the  algorithm  in  Figure  3.6  with  two  types  of  consistency  tests,  the  first 
testing  tradeoff  consistency  at  a single  point,  and  the  second  testing  consistency  at 
successive  points.  The  single  point  test  is  a standard  procedure  described  in  almost  every 
tradeoff  assessment  scheme  in  the  literature  [4],[14].  It  requires  a second  set  of 
assessments  at  each  point,  using  a different  price  variable  X|^.  For  price  variable  Xj, 

= dXj/dXj',  for  price  variable  x,j,  = dx^/dx^.  The  chain  rule  implies  A,,^  = A,^  Aj^. 
Since  only  N-1  unique  tradeoffs  among  the  attributes  exist  at  any  point,  the  second  set 
can  be  used  to  measure  the  discrepancy: 

% error  = J^(Ax,/Ax^)  - (Ax,/AXj)  (AXj/Ax,^)J  / ^Ax,/Ax^^ 
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Certainly  we  would  not  expect  exact  agreement.  Instead,  we  set  a reasonable  tolerance 
level;  if  the  discrepancy  exceeds  the  tolerance,  the  analyst  should  explain  the 
inconsistency  to  the  decision  maker  and  reassess  the  tradeoffs  until  the  discrepancy  is 
resolved. 

This  scheme  checks  tradeoff  consistency  at  a single  outcome,  but  does  not  reveal 
information  about  the  shape  of  the  indifference  curves.  Axiom  2.4  requires  strictly 
convex  indifference  curves  since  the  marginal  rates  of  substitution  are  decreasing.  To 
check  for  violations  of  this  axiom,  tradeoffs  at  outcomes  on  the  same  indifference  curve 
must  be  compared.  Our  algorithm,  however,  never  generates  outcomes  on  the  same 
indifference  curve  since  each  iteration  yields  an  improvement.  Since  our  primary 
motivation  is  the  reduction  of  the  total  number  of  assessments  required  to  reach  the 
optimum,  we  certainly  do  not  want  to  make  many  extra  assessments  just  to  check 
consistency.  We  must  try  to  strike  a balance  between  the  number  of  assessments 
required  to  reach  a given  consistency  level  and  the  level  of  consistency  required  to  reach 
the  optimum  in  a small  number  of  iterations.  If  the  tradeoff  assessments  are  very 
inconsistent,  the  proxies  will  be  poor  preference  models  and  the  resulting  trial  sequence 
will  converge  very  slowly  at  best 

Fortunately,  the  algorithm  in  Figure  3.6  provides  a consistency  check  for  tradeoffs 
at  successive  points  without  requiring  any  extra  assessments.  It  is  easy  to  prove  that  the 
sum-of-exponentials  proxy  obeys  Axiom  2.4  if  and  only  if  the  parameters  a and  w 
are  strictly  positive  (see  Appendix  B:  analogous  results  hold  for  the  Cobb-Douglass  and 
sum-of-powers  proxies  as  well).  When  the  proxy  is  fit  from  the  current  and  previous 
tradeoffs  at  each  iteration,  the  sign  of  each  a^  and  can  be  checked.  We  must 
realize  that  this  scheme  checks  for  decreasing  marginal  rates  of  substitution  of  the  proxy 
only.  The  indifference  curves  of  the  true  objective  are  unknown  and  their  convexity 
cannot  be  verified  directly  without  numerous  additional  assessments.  If  any  a^  or  w, 
is  negative  or  zero,  the  tradeoffs  should  be  reassessed.  If  the  parameter  remains 
nonpositive,  the  algorithm  could  continue  with  a MFW  spacer  step  using  just  the 
tradeoffs  at  the  current  point.  Figure  3.10  shows  the  proxy  iteration  algorithm  with 
both  consistency  tests. 
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Figure  3.10  Proxy  Iteration  Algorithm  with  Consistency  Tests 
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Although  I do  not  suggest  a least  squares  approach,  it  could  be  used  to  handle  the 
consistency  problem.  Instead  of  fitting  the  proxy  exactly  from  2N-1  assessments,  the 
parameters  could  be  chosen  to  minimize  the  error  from  all  or  any  subset  of  previous 


assessments,  assigning  the  heaviest  relative  weights  to  the  most  recent  tradeoffs.  Initially, 
the  idea  of  utilizing  all  previous  information  sounds  promising,  but  after  setting  up  the 
problem,  we  see  that  this  nonlinear  least  squares  fit  is  a very  difficult  optimization 
problem.  For  the  sum-of-exponentials  proxy,  we  must  minimize  with  respect  to 
«j.  Wj W|yi  and  a2,  a^ a^g  at  each  iteration,  where 

The  first  index  sums  over  all  previous  assessments.  Quasi- Newton  methods  or 
successive  linear  approximation  on  the  normal  equations,  even  with  least  squares 
updating  procedures,  would  require  a huge  additional  amount  of  computer  time  at  each 
iteration.  Even  if  we  ignored  the  computation  cost,  we  may  still  have  poor  parameter 
fits  since  each  a,  counterbalances  the  effect  of  its  associated  Wj. 

This  discussion  of  the  least  squares  approach  reminds  us  that  our  primary  goal  is 
finding  the  optimum  while  incurring  a reasonably  small  assessment  and  computation 
cost.  Complex  time-consuming  consistency  checks  should  not  be  used  since  we  still  have 
only  a proxy  at  each  iteration  regardless  of  the  effort  expended.  The  two  consistency 
checks  I employ,  each  requiring  minimal  extra  assessment  and  computation,  should  be 
sufficient  to  keep  the  trial  sequence  from  going  astray. 

Research  in  a similar  vein  at  U.C.L.A.  supports  this  conclusion.  Geoffrion,  Dyer, 
and  Feinberg  [14], [8]  and  Hogan  [15]  developed  an  interactive  Frank-Wolfe 
algorithm.  Hogan  investigated  the  effects  of  error  in  the  gradient  when  using  the 
modified  Frank-Wolfe  method.  At  each  iteration,  he  added  a noise  vector  3 to  the 
true  gradient  W(x).  At  the  k***  iteration,  his  noisy  gradient  z^  is  the  sum  of  the  true 
gradient  and  the  error  term, 

zk  = VV(x'‘r  + 

Hogan  [15]  proved  that  if  3*'  " -•  infinite  sequence  {x’‘} 
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generated  by  the  algorithm,  every  accumulation  point  is  a solution.  The 

assumption  !i'‘  -♦  Q implies  in  our  decision-making  context  that  the  assessment 
procedure  is  a learning  process  with  diminishing  error.  Dyer  [9]  extended  Hogan's 
result  for  the  case  where  -»  3°°  * 0.  At  each  iteration,  he  replaced  VV(x'‘)  by 
a'‘VV(x'‘)  + 3*^^  where  o''  is  the  scaling  factor  and  3''  is  the  assessment  error.  Dyer 
defined  the  approximate  solution  set 

r(5)  = {X  e X(D)  1 V(x)  > V(y)  - 6 Vy  e X(D)} 

and  proved  that  if 

a)  lim|^_^jjQ  a'‘  = and  where  0 < lla°°ll  < 00  and 

Il3°°ll  < 00. 

then  either 

b)  the  modified  Frank-Wolfe  algorithm  terminates  at  some  finite  iteration  k and 

x''  € r[(^'‘/o'')ll3'‘ll],  where  4'^  = ~ "^^’^i€x(D) 

kT 

+ n 1< 
or 

c)  the  MFW  algorithm  generates  an  infinite  sequence  {x''}.  and  every 

1 accumulation  point  x°°  of  {x'‘}  is  contained  in  r[(|'^/a°°)ll3°°ll],  where  = 

"’'•VX(D)  = '™k^00 

Appendix  E contains  the  full  statement  of  Dyer's  result.  The  quantity  [(^'‘/a'‘)ll3'‘ll] 
[ bounds  the  error  at  any  iteration  k;  it  is  the  analog  of  equation  (3.3),  adjusted  for  the 

1 assessment  error  3'“.  As  long  as  the  assessment  error  is  bounded,  the  modified 

Frank-Wolfe  algorithm  converges  to  a solution  whose  error  is  bounded.  In  this  case, 
actual  computation  of  the  error  bound  at  iteration  k requires  specification  of  an  upper 
bound  on  Il3''ll.  The  Spacer  Step  Theorem  guarantees  that  Dyer's  error  analysis  holds 
for  the  sum-of-exponentials  algorithm  with  the  MFW  spacer  step, 
j Geoff rion.  Dyer,  and  Feinberg  [14]  applied  their  interactive  Frank-Wolfe 

procedure  to  the  allocation  of  Faculty  time  among  teaching,  research,  and  departmental 
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duties  at  the  U.C.L.A.  Graduate  School  of  Management.  They  claim  the  decision  makers 
were  able  to  provide  the  required  information  "without  significant  difficulty".  In  a 
follow-up  paper  [9],  Dyer  concludes: 

Thus  the  use  of  interactive  programming  is  not  dependent  on  the 
assumption  of  subject  behavior  consistent  with  the  existence  of  a 
preference  relation  weakly  ordering  ail  alternatives.  Rather, 
responses  that  reflect  the  'human  element'  in  the  form  of  random 
errors  and  inconsistencies  does  not  appear  to  be  a significant 
hindrance  to  the  use  of  this  robust  procedure. 

This  chapter  concludes  the  theoretical  development  of  the  proxy  iteration  algorithm 
under  certainty.  In  the  next  chapter,  I compare  the  normatively  motivated  proxies  to  the 
linear  proxy  in  a variety  of  examples,  letting  the  computer  play  the  role  of  the  decision 
maker.  I also  state  results  on  the  initial  rates  of  convergence  of  the  algorithms. 


Chapter  IV 

THE  NEW  ALGORITHM  VERSUS  THE  OLD 

4.1  An  Example  With  Boyd’s  Algorithm 

To  see  if  the  new  algorithm  is  faster,  1 also  programmed  Boyd’s  technique  and  used 
it  to  solve  the  example  in  Figure  3.9.  The  trial  sequence,  beginning  at  the  same  x^  is 
listed  in  Table  4.1.  The  algorithm  generated  twenty-six  trial  points  before  converging. 
The  list  is  long  because  the  linear  pseudo-objective  is  a poor  preference  model;  the  first 
trial  point  at  each  iteration  is  an  extreme  point  clearly  inferior  to  x'-  After  observing 
Table  4.1,  Boyd  commented  that  if  the  first  step  of  each  iteration  had  been  used  only  to 
guide  the  search,  and  not  as  a trial  point,  assessments  would  have  been  required  at  only 
seventeen  points.  In  contrast,  the  algorithm  with  the  sum-of-exponentials  proxy  found  a 
more  precise  solution  to  this  same  problem  requiring  assessments  at  only  six  points. 

Assessment  cost  and  computation  cost  are  the  two  criteria  to  be  considered  when 
comparing  the  algorithms.  The  total  assessment  requirement  is  the  key  factor  since  it  is 
by  far  the  most  time-consuming  and  most  costly  part  of  the  procedure.  In  the  example 
of  Figure  3.9,  Boyd’s  algorithm  required  assessments  at  three  times  as  many  points  as  the 
new  algorithm.  With  respect  to  the  second  criterion,  Boyd’s  method  used  three  times  as 
many  CPU  seconds  as  well.  Although  the  new  technique  seems  far  superior  in  this 
example,  no  general  conclusions  can  be  drawn.  In  the  next  section,  1 systematically 
compare  the  efficiency  of  the  different  proxies. 

4.2  Comparison  of  the  New  and  Old  Proxies 

With  the  computer  again  playing  the  role  of  the  decision  maker,  three  proxies  are 
tested  for  three  objectives  with  three  different  constraint  sets.  Since  the  true  objectives 
are  given,  the  optimal  solution  for  each  problem  can  be  determined  beforehand,  and  the 
stopping  criterion  of  Figures  3.8  and  3.9  can  be  used: 

Stop  at  x*‘  (lXj*  • Xj'‘|  / Xj*^  < 1%  for  all  j. 
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Trial  Sequence  of  Bcyd's  Algorithm 
with  Oobb-Douglass  Exanple 

POIDT  IWmRAIHE 


20.00 

10.00 

15.00 

12.97279 

0.05 

49.95 

0.01 

1.04264 

16.01 

17.99 

12.00 

15.56594 

0.05 

49.95 

0.01 

1.04264 

12.82 

24.38 

9.60 

16.21217 

99.90 

0.02 

0.01 

0.10665 

30.23 

19.51 

7.69 

16.10451 

16.30 

23.41 

9.22 

16.46474 

99.90 

0.02 

0.01 

0.10665 

33.02 

18.73 

7.38 

15.86509 

19.65 

22.47 

8.05 

16.54230 

0.05 

49.95 

0.01 

1.04264 

15.73 

27.97 

7.08 

16.51001 

18.86 

23.57 

8.50 

16.60035 

0.05 

49.95 

0.01 

1.04264 

15.10 

20.85 

6.80 

16.42956 

18.11 

24.03 

8.16 

16.62568 

99.90 

0.02 

0.01 

0.10665 

34.47 

19.71 

6.53 

15.82191 

21.38 

23.64 

7.83 

16.63563 

0.05 

49.95 

0.01 

1.04264 

17.11 

28.90 

6.27 

16.45580 

20.53 

24.69 

7.52 

16.65896 

0.05 

49.95 

0.01 

1.04264 

16.43 

29.75 

6.02 

16.35724 

19.71 

25.71 

7.22 

16.65357 

20.36 

24.90 

7.46 

16.66006 
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For  each  proxy,  we  want  to  compare  the  number  of  trial  points  generated  before  the 
tolerance  test  is  passed.  To  give  Boyd's  algorithm  its  best  performance,  the  first  point  at 
each  iteration  is  used  only  to  specify  the  direction;  it  is  not  counted  as  a trial  point. 

Table  4.2  shows  the  results  for  the  linear,  Cobb- Douglass,  and  sum-of-exponentials 
proxies  in  problems  with  one  linear  constraint.  In  order  to  cover  a wide  range  of 
problems,  representative  of  those  likely  to  be  encountered  in  practice,  the  experiments 
are  performed  for  each  proxy,  for  both  symmetric  and  skew  sum-of-exponentials, 
Cobb- Douglass,  and  sum-of-powers  objectives.  In  a symmetric  objective,  all  attributes 
have  approximately  equal  importance:  in  a skew  objective,  some  attributes  are  weighted 
more  heavily  than  others.  To  avoid  any  special  properties  of  low  dimension,  the 
programs  are  run  for  both  three-  and  six-attribute  problems. 

In  all  thirty-six  examples  of  Table  4.2,  the  maximizations  at  each  iteration  have 
closed-form  solutions.  Maximizing  the  linear  proxy  with  one  linear  constraint  is  a 
special  linear  programming  problem.  The  optimum  can  be  found  analytically  since  it  is 
the  extreme  point  with  the  highest  simplex  multiplier.  For  the  strictly  concave  proxies, 
the  primal-dual  technique  of  section  3.3  provides  an  analytic  solution.  The  computer 
runs  are  fast  since  the  closed-form  maximizations  are  programmed  directly  into  the 
algorithm. 

Table  4.2  shows  the  number  of  trial  points  and  the  number  of  CPU  seconds 
required  for  convergence  at  the  1%  tolerance  level.  It  would  be  unfair  to  compare 
Boyd's  linear  proxy  to  a concave  proxy  that  was  identical  to  the  true  objective.  The 
results  using  the  sum-of-exponentials  proxy  for  the  sum-of-exponeniials  objective  and 
the  Cobb-Douglass  proxy  for  the  Cobb-Douglass  objective  serve  only  as  reference 
points.  In  each  example  of  Table  4.2,  the  concave  proxies  are  much  more  efficient  than 
the  linear  proxy.  For  the  sum-of-exponentials  objective.  Boyd's  linear  proxy  uses  50% 
more  assessments  than  the  Cobb-Douglass  proxy  in  three  dimensions,  and  500%  to  800% 
more  in  six  dimensions.  For  the  three-dimensional  Cobb-Douglass  objective,  Boyd's 
proxy  requires  five  times  as  many  assessments  as  the  sum-of-exponentials  proxy.  In  the 
six-dimensional  problem,  I stopped  the  linear  algorithm  at  4%  and  5%  tolerance  levels 
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Table  4.2  Comparison  of  Proxy  Functions 


r 

i;' 


Constraint:  Linear  (1) 

Tolerance:  1% 

PROXY 


PREFERENCE 

1 

1 

FUNCTION 

LINEAR 

COBB-DOUCLASS 

SUM-of-EXPO.  1 

1 dimen- 
1 sion 

shape 

U pts.  req. 
assessment 

CPU 

seconds 

# pts.  req. 
assessment 

CPU 

seconds 

^ pts.  req. 
assessment 

CPU 

seconds 

1 Sum-of-Expo. 

3 

symmetric 

18 

2.8 

12 

1.6 

3 

0.6 

3 

skew 

14 

2.4 

10 

1.3 

3 

0.7 

6 

symmetric 

76 

12.6 

12 

1.7 

3 

0.7 

6 

skew 

71 

10.9 

8 

1.1 

3 

0.7 

1 Cobb-Doui;lass 

3 

symmetric 

22 

3.1 

2 

0.3 

5 

1.3 

3 

skew 

28 

4.4 

2 

0.3 

5 

1.1 

6 

symmetric 

45+ 

4%  Tol. 

7.2 

2 

0.3 

7 

1.6 

6 

skew 

61  + 

5%  Tol. 

9.8 

2 

0.3 

8 

2.2 

1 Sum-of-Powers 

3 

symmetric 

5 

excellent  s( 

0.7 

trting  pt. 

5 

excellent  sla 

0.8 
rting  pL 

4 

excellent  sla 

1.0 
rting  pL 

3 

skew 

24 

3.6 

7 

1.1 

8 

2.4 

Nonneg. 

6 

symmetric 

44+ 

5%  Tol. 

7.5 

10 

1.9 

6 

1.5 

6 

skew 

72+ 

2%  Tol. 

11.4 

6 

1.0 

6 

1.6 

Chapter  4:  Comparisons 


i 


! 

k 


since  it  had  already  used  seven  times  as  many  assessments  and  was  improving  very 
slowly.  The  sum-of-powers  example  allows  us  to  compare  all  three  proxies  at  once. 
Even  though  the  Cobb-Douglass  function  is  a special  case  of  the  sum-of-powers,  the 
sum-of-exponentials  proxy  is  just  as  good  or  better  since  it  is  fit  from  twice  as  much 
information.  The  Cobb-Douglass  proxy  requires  N-1  parameters  from  one  point, 
whereas  the  sum-of-exponentials  proxy  requires  2N-1  tradeoffs  from  three  points. 
The  convergence  results  with  the  sum-of-powers  objective  follow  the  same  pattern, 
demonstrating  the  superiority  of  the  normatively  motivated  proxies.  Their  small 
assessment  requirement  and  compulation  cost  make  the  new  algorithm  practical,  whereas 
the  assessment  demand  with  the  linear  proxy  is  prohibitive. 

A few  basic  principles  can  be  drawn  from  the  these  results.  First  and  foremost,  we 
should  select  as  our  proxy  the  function  that  best  represents  the  decision  maker's  true 
preferences.  Secondly,  we  should  select  as  our  starting  point  our  best  estimate  of  the 
true  optimum.  Both  of  these  tasks  can  be  accomplished  by  using  Barrager  and  Keelin's 
global  assessment  procedures,  reviewed  in  section  2.2,  as  the  first  step  of  the  algorithm. 
In  this  first  step,  we  encode  a global  deterministic  preference  function.  Then  we  begin 
the  iterative  procedure,  using  the  global  function  as  the  local  proxy  and  its  optimum  as 
the  starting  point.  With  this  combined  method,  we  fully  exploit  the  power  of  the  global 
procedure,  yet  avoid  its  severe  restrictions.  We  never  assume,  even  in  the  small,  that 
the  proxy  is  the  true  objective;  we  merely  use  it  as  a mechanism  to  guide  the  search  to 
the  optimal  solution. 

Table  4.3  shows  a set  of  experiments  comparing  the  linear  and  Cobb-Douglass 
proxies  in  problems  with  the  nonlinear  constraint  2,  C|X|^  < b.  In  each  problem,  the 
normatively  motivated  proxy  outperforms  the  linear  proxy  by  a wide  margin.  Though 
the  following  observation  does  not  concern  us  directly,  it  is  interesting  to  note  that  the 
linear  approximation  converged  faster  with  the  ellipsoidal  constraint  than  with  the  linear 
constraint  since  the  initial  trial  solutions  at  each  iteration  were  not  outlying  extreme 
points. 

The  third  and  final  set  of  experiments  compares  Boyd's  pseudo-objective  with  the 
sum-of-exponentials  proxy  in  problems  with  several  linear  constraints.  Linear 
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Table  4.3  Second  Comparison  of  Proxy  Functions 


Constraint:  Ellipsoidal  (1) 


Tolerance:  1% 


PROXY 


PREFERENCE 

FUNCTION 


LINEAR 


COBB-DOUGLASS 


dimen- 

sion 

shape 

# pts.req. 
assessment 

CPU 

seconds 

# pts.req. 
assessment 

CPU 

seconds 

Sum- 

of- Expo. 

3 

symmetric 

26 

3.6 

8 

1.2 

3 

skew 

28 

3.8 

13 

1.9 

6 

symmetric 

29 

3.9 

9 

1.2 

6 

skew 

27 

3.6 

17 

2.6 

Cobb- Douglass 

3 

symmetric 

18 

2.3 

2 

0.3 

3 

skew 

23 

3.1 

2 

0.3 

6 

symmetric 

21 

2.8 

2 

0.3 

6 

skew 

27 

3.6 

2 

0.3 

Sum-of-Powcrs 

3 

symmetric 

11 

1.6 

3 

0.4 

3 

skew 

17 

2.5 

3 

0.4 

6 

symmetric 

13 

1.9 

4 

0.7 

6 

skew 

15 

2.2 

4 

0.7 
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programming  and  convex  programming  routines  were  required  since  no  analytic 
solutions  exist  for  these  problems.  The  results  in  Table  4.4  again  show  the  new  proxy  is 
much  more  efficient  than  the  old,  requiring  only  a small  fraction  of  the  assessments  and 
CPU  time. 

In  an  attempt  to  find  a case  in  which  Boyd’s  algorithm  was  faster,  I contrived  the 
following  example: 


Maximize  V(x)  = 8xj®*  + Xj®’  + Xj®* 

subject  to  2j  Xj  < 100,  x > 0. 

The  results  with  the  three  proxies  are  listed  below: 


Linear: 
Cobb- Douglass: 
Sum-of-exponentials: 


2 iterations 
2 iterations 
4 iterations 


0.3  CPU  seconds 
0.4  CPU  seconds 
0.9  CPU  seconds. 


The  optimal  solution,  x*  = (100,  0,  0),  is  obvious  by  inspection  since  the  objective  is 
nearly  lexicographic.  Linear  and  lexicographic  preference  structures  would  be  identified 
early  in  an  analysis  and  would  be  modeled  with  other  techniques. 

If  we  formally  analyze  the  rate  of  convergence,  our  primary  concern  is  the  initial 
rather  than  the  asymptotic  rate  since  only  a modest  number  of  interactive  iterations  can 
be  performed.  Very  little  is  known  about  the  initial  rate  of  convergence  of 
mathematical  programming  algorithms  in  general,  but  for  the  Frank-Wolfe  algorithm, 
Wolfe  [34]  and  Amor  [1]  demonstrated  the  following  result: 

If  V is  boundedly  concave  (i.e.,  V is  concave  with  continuous  second 
derivatives  on  X and  a uniform  lower  bound  on  all  eigenvalues  of  the 
Hessian)  and  X is  a convex,  compact  .set,  then  the  error  in  the  objective 
function  is  at  least  halved  for  each  of  the  first  K iterations: 

([V(x*)  - V(x'‘*’)]  / [V(x*)  - Vfx*)])  < '/2,  k < K, 

but  K remains  unknown. 

Wolfe  claims  this  result  should  hold  as  long  as  x*^  is  "sufficiently  far"  from  the 


65 


Table  4.4  Third  Comparison  of  Proxy  Functions 


Constraints:  Linear  (3) 
Tolerance:  1% 


PROXY 


PREFERENCE 

FUNCTION 

LINEAR 

SUM-of-EXPO. 

M pts.  req. 
assessment 

CPU 

seconds 

it  pts.  req. 
assessment ' 

CPU 

seconds 

Cobb- Douglass 

4 skew 

12+ 

5%  Tol. 

6.70 

4 

1.95 

Suni-of- Powers 

4 symmetric 

21+ 

7%  Tol. 

11.51 

3 

1.67 
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optimum.  However,  the  result  is  not  useful  since  "sufficiently  far"  is  undefined  and  K 
is  unknown. 

Dyer  [9]  gives  the  assessment  errors  ii'‘  a stochastic  interpretation.  He  proves  that 
if  each  ii'‘  is  a sample  from  a multivariate  density  function  with  mean  0 and  finite 
variance,  then  the  mean  initial  rate  of  convergence  of  the  modified  Frank-Wolfe 
algorithm  is  unchanged.  The  Spacer  Step  Theorem  insures  the  same  initial  rate  of 
convergence  holds  for  each  of  the  first  K groups  of  steps  of  the  proxy  iteration 
algorithm,  but  K again  remains  unknown. 

In  our  proxy  approach.  Axioms  2. 1-2.4  are  the  only  restrictions  we  wish  to  impose 
on  the  objective  function.  Even  if  we  make  further  assumptions  bounding  the  local 
curvature  of  V,  we  still  have  an  infinite  set  of  possibilities.  Establishing  more  concrete 
results  is  therefore  a very  difficult  task.  Amor  [1]  summarizes  his  mathematical 
analysis  claiming  "the  initial  rate  of  convergence  is  primarily  determined  by  the  initial 
X®  and  the  relationship  between  the  directional  derivatives  and  the  local  curvature  of 
V."  His  claim  is  entirely  consistent  with  our  intuition  and  the  results  we  drew  from  the 
experiments  of  this  chapter.  Since  the  normatively  motivated  proxies  approximate  the 
local  curvature  of  V(x)  much  belter  than  does  the  linear  proxy,  they  generate  a greater 
improvement  at  each  iteration  and  yield  a higher  rate  of  convergence. 


Chapter  V 

THE  PROXY  ITERATION  ALGORITHM 
FOR  DECISION  MAKING  UNDER  UNCERTAINTY 


5.1  The  Proxy  Approach  Under  Uncertainty 

Figure  1.1  shows  that  under  uncertainty,  the  decision  problem  is 

maXd^D  Js  {lit} 

where  u is  a cardinal  utility  function  and  {sic}  is  the  joint  probability  distribution 
over  the  state  variables.  Using  the  preference  decomposition  approach  illustrated  in 
Figure  2.1,  the  decision  problem  becomes 

maXd^D  j^s  {s|c} 

where  n is  an  appropriately  chosen  numeraire.  We  are  no  longer  trying  to  find  the 
most  preferred  outcome  x*;  rather,  we  are  searching  for  the  most  preferred  decision 
d*.  Therefore,  the  problem  can  no  longer  be  structured  as  a choice  of  x over  X(D), 
but  rather  as  a choice  of  d over  D. 

Axioms  2. 1-2. 9 guarantee  the  existence  of  a real-valued  risk  preference  function  u 
and  deterministic  preference  function  V.  Just  as  under  certainty,  we  assume  that  the 

global  assessment  procedures  for  u and  V are  too  restrictive,  so  we  try  to  use  our 

proxy  approach  instead. 

When  applying  the  proxy  technique  to  this  problem,  we  decompose  deterministic 
and  risk  preferences,  first  assessing  deterministic  tradeoffs,  then  selecting  a numeraire 
and  assessing  risk  preference.  We  are  trying  to  find  the  decision  d*  that  maximizes  the 
expected  utility  <u{n[V(x(d,s))]}|£>.  This  composite  function  notation  is  cumbersome, 
so  we  define  the  following  abbreviated  notation  showing  only  the  dependence  of  u(n) 
upon  the  decision  variables  and  state  variables; 

u(d,s)  = u{n[V(x(d,s))]|£} 

Taking  the  expectation  with  respect  to  s. 
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<u(d)>  = u{n[V(x(d.s))]}  {sic} 

We  also  define  p[dld'‘,s]  and  q[dld'‘,s]  as  approximations  of  V(d,s)  and  u(d,s), 
respectively,  fit  from  deterministic  and  risk  preference  assessments  at  x(d'‘'§)-  Similarly, 
the  notation  <q(d)ld'‘>  represents  q{n[p[x(d,s)ld'‘]]}  (sic).  The  following  theorem 
is  the  analog  of  Theorem  3.1  for  decision  making  under  uncertainty. 

Theorem  5.1.  If  the  decision  maker's  preference  ordering  satisfies  the  deterministic 
and  risk  preference  axioms  (2. 1-2.9),  if  D is  convex  and  <u(d)>  is  concave,  if  d*  is 
a regular  point  of  the  constraints,  and  if  q[d|d*,s]  is  constructed  so  Vj<q(d*)|d*>  = 
tVj<u(d*)>  for  some  positive  scalar  t,  then  if  d*  maximizes  <q[d|d*,s]>  over  all  d 
£ D,  then  d*  also  maximizes  <u(d)>  over  all  d € D. 

The  proof  of  Theorem  5.1  parallels  the  proof  of  Theorem  3.1.  We  write  the 
constraint  set  D as  D = {d  | h(d)  = 0,  g(d)  < 0}. 

Proof:  The  regularity  conditions  hold  for  all  d £ D,  so  if  d*  maximizes 
<q[d|d*,s]>  over  all  d £ D,  the  Kuhn-Tucker  necessary  conditions  guarantee  the 
existence  of  X and  a > 0,  such  that 

A^dllld*)  + H^dSld*)  = 0. 

By  hypothesis,  Vj<q(d*)ld*>  = lVj<u(d*)>  for  some  positive  scalar  1;  by  defining  t = 
(l/t)A  and  v = (l/t)^!,  we  have 

Vj<u(d*)>  + TVjh(d*)  + pVjgfd*)  = 0,  p > 0. 

By  assumption,  D is  convex  and  <u(d)>  is  concave,  so  the  Second-Order  Sufficiency 
Conditions  (Appendic  C)  hold  at  d*.  Therefore  d*  maximizes  <u(d)>.  Q.E.D. 

Theorem  5.1  motivates  the  proxy  iteration  algorithm  under  uncertainty.  At  each 
iteration  k,  we  must  fit  a proxy  <q(d)|d'‘>  for  the  true  objective  <u(d)>.  This  proxy 
must  generate  a feasible  direction  of  improvement  in  the  decision  space  D;  therefore  it 
must  satisfy  the  following  property:  for  some  t > 0, 

Vj<q(d)ld'‘>  = tVj<u(d^)>. 

Under  uncertainty,  the  true  objective  is  not  a utility  function;  rather,  it  is  the  expected 
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utility  of  a lottery  as  a function  of  the  decision  d.  Consequently,  the  proxy  is  not  a 
utility  function  either;  it  is  an  approximation  of  <u(d)>  = u[x(d,s)]  {sit}. 

At  each  iteration,  we  maximize  the  proxy 

maXdgD  <[q(d)|d‘‘]> 

and  continue  the  procedure  until  d'‘^'  = d'‘. 

Unfortunately,  we  do  not  know  how  to  construct  the  proxies  <p[d|d'‘]>  and 
<q[d|d'‘]>  to  guarantee  that 

Vd<p[dld'‘]>  = tVj<V(d^)>  (5.1) 

for  some  t > 0,  for  the  expected-value  decision  maker,  and  that 

Vd<q[d|d'‘]>  = tVj<u(d'')>  (5.2) 

for  the  risk-sensitive  decision  maker.  As  a result,  we  cannot  guarantee  that  each 
iteration  makes  an  improvement  in  the  true  objective  <u(d)>.  In  the  deterministic 
problem,  the  proxy  p(x)  always  provided  a direction  of  improvement  in  the  outcome 

space  X(D)  since  Vp(x*')  was  collinear  with  VVfx*^);  by  construction,  Vj^pfx*')  = 

tVjjV(x'').  for  some  t > 0.  Under  uncertainty,  however,  the  decision  maker  must  choose 
among  lotteries  rather  than  among  deterministic  outcomes  since  each  decision  vector  d 
produces  the  outcome  lottery  {x|d,c}  as  shown  in  Figure  1.1.  Deterministic  and  risk 
preferences  must  be  assessed  over  the  outcome  variables,  not  the  decision  variables. 
Constructing  the  proxies  to  satisfy  (5.1)  and  (5.2)  is  difficult  since  d*^  corresponds 
not  to  one  outcome  x*^,  but  to  a probability  distribution  over  the  outcome  space  X(D). 

A scheme  must  be  devised  for  choosing  the  outcome  at  each  iteration  where  the 

parameters  are  to  be  assessed  so  the  resulting  proxies  satisfy  (5.1)  and  (5.2).  Such  a 

scheme  would  guarantee  a direction  of  improvement  in  <u(d)>  at  each  iteration  k;  it 
would  specify  a decision  d^^’  € D such  that  <u(d^'‘*’^)>  > <u(d*‘)>  where  = 

od*'  + (l-a)^***'  for  some  a,  0 < a < 1.  The  direction  of  improvement  required  in 
the  decision  set  D does  not  correspond  to  any  meaningful  direction  of  improvement  in 

the  outcome  space  X(D)  since  d*'  and  d'‘*'  do  not  correspond  to  any  unique  x** 

and  x'***. 
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5.2  A Scheme  for  Fitting  the  Proxies 

A natural  point  for  assessing  the  proxy  parameters  is  the  conditional  expectation 
<xld,£>.  The  following  example  shows,  however,  that  when  fitting  the  proxies  at 
the  algorithm  may  fail  to  generate  a direction  of  improvement  at  each 
iteration.  The  example  uses  an  expected-value  decision  maker  with  one  decision  variable 
and  one  state  variable.  The  decision  maker’s  problem  is: 

MaXd^D 

where  V(x)  is  unknown  and  cannot  be  assessed.  In  this  example,  we  use  a 
Cobb-Douglass  proxy  for  a sum-of-exponentials  deterministic  preference  function. 

Deterministic  preference  function;  V(x)  = - e - e (53) 

Probability  distribution  on  s:  {s|c}  = 2s/3  for  1 < s < 2,  0 elsewhere 

System  Model:  Xj(d,s)  = (d-d^)s  ; Xjfd.s)  = ds 

Constraint:  D = {d  | 0 < d < 1} 

The  expected  value  of  the  true  objective,  as  a function  of  d,  is: 

<V(d)>  = (-2/3)  [se‘0-3(d-d2)s^.  se’^'^ds^^s. 

After  integrating  by  parts,  the  definite  integral  is  written  as  a function  of  d alone: 

((-7.4074)/(d-d^))  (^[e-0-6(d-d^)]  [-0.6(d-d^)  - 1]  - [e-O  Kd-d^)^  j;.o.3{d-d“)  - 1])  + 
((-4.1667)/d^)  ([e'O  8^]  [-0.8d  - 1]  - [-0.4d  - 1])  (5.4) 

Figure  5.1  is  a graph  of  <V(d)>  for  0.01  < d < 0.99. 

The  Cobb-Douglass  proxy,  in  its  additive  form,  is 

p[x(d,s)]  = In  Xj(d,s)  + (l-j3)  In  X2(d,s)  (5.5) 

Fitting  the  proxy  at  the  conditional  expectation  <x|d,e>, 

<x,|d.t>  = x,(d,s){s|c}  = (14/3)(d-d^)  (5.6) 
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<X2|d.c>  = X2(d.s){s|e}  = (14/3)d  (5.7) 

Using  the  sum-of-exponentials  preference  function  (5.3)  to  generate  tradeoffs 
at  <x|d,£>, 

X2  = -dxj/dx2  = (4/3)  e (5.8) 

For  the  Cobb-Douglass  proxy  (5.5), 

[3p(x)/9x2]/(3p(x)/3xj]  = -dxj/dx2  = (l-^)Xj//9x2, 
so 

p = Xj/(Xj  + X2X2)  (5.9) 

The  expectation  of  the  Cobb-Douglass  approximation  fit  at  <x|d,c>  is  then  calculated 
to  find  the  proxy  <p(d)>  for  the  true  objective  <V(d)>: 

<p(d)>  = 0.6667  [j3sln(d-d^)s  + (l-/3)sln(ds)]ds. 

After  integrating  by  parts,  this  definite  integral  is  written  as  a function  of  d: 

[0.6667/8([2  ln(2d-2d2)  - 1]  - [0.5  ln(d-d2)  - 0.25])  + 

0.6667(l-j3)([2ln(2d)  - 1]  - [0.5ln(d)  - 0.25])] 

Maximizing  <p(d)>  with  respect  to  d, 

3<p(d)>/3d=0  at  d®  = l/(l+jS).  (5.10) 

For  0 < ^ < 1,  we  have  0.5  < d®  < 1,  so  d®  € D. 

Equations  (5.1)  through  (5.10)  and  Figure  5.1  provide  the  information  needed  to 
begin  the  algorithm. 

Choose  initial  d°  = 0.8. 

Fit  proxy  at  <x|d'’,c>.  Using  (5.6)  and  (5.7), 

<x|d0=0.8,£>  = (0.7467,  0.3733). 

Using  (5.8)  and  (5.9), 

X2  = 1.4367  and  p = 0.5820, 
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so  the  proxy  for  the  true  objective  is 

<p(d)ld®=0.8>  = 0.6667  [0.582sln(d-d2)s  + 0.418sln(ds)]ds. 

Equation  (5.10)  shows 

d'  = <p(d)ld'’=0.8>  = 0.632. 

However, 

<V(d)|dO=0.8>  = -1.5385  and  <V(d)|d^=0.632>  = -1.5739, 

so  d*  is  worse  than  d®.  Figure  5.1  shows  the  expected  value  of  the  true  objective 
decreases  tnonotonically  from  d®  = 0.8  to  d'  = 0.632;  no  step  in  this  direction,  no 
matter  how  small,  will  improve  <V(d)>.  Therefore,  the  algorithm  is  not  globally 
convergent  when  the  proxy  is  fit  at  the  conditional  expectation  <x|d,£>. 

Perhaps  some  other  proxy  fitting  scheme  can  be  found  that  guarantees  improvement 
at  each  iteration.  Finding  such  a scheme  is  a difficult  task  since  the  resulting 
approximation  of  u(x)  must  still  be  integrated  over  all  outcomes  in  the  lottery 
{xld.e}.  This  task  remains  the  subject  of  future  research. 

Boyd  [4]  tried  to  use  his  linear  approximation  technique  for  decision  making  under 
uncertainty.  Instead  of  fitting  the  approximation  at  one  point,  he  tried  approximating 
the  risk  preference  function  at  all  outcomes  in  the  lottery  {x|d,£}.  At  each  iteration  k, 
his  method  requires  the  assessment  of  two  new  parameters, 

and 

Y[x(d''.s)]  = [du(n)/dn],„,„,-,(j3)3 

m addition  to  X at  every  possible  realization  of  s.  The  resulting  assessment 
requirements  are  enormous;  when  {sic}  is  continuous,  an  infinite  number  of 
jNsessmcnts  is  required;  when  {s|f}  is  discrete,  the  assessment  demands  are  generally 
pr'  hihiiive  since  even  the  simplest  binary  probability  distribution  would  require  six 
I-  many  assessments  as  its  deterministic  analog  (including  p and  y). 
.<  the  assessment  of  y may  he  a bewildering  task  for  the  decision  maker. 
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Consequently,  Boyd's  procedure  is  not  practical  for  decision  making  under  uncertainty. 

Even  if  a successful  proxy  fitting  scheme  could  be  found,  the  analyst  must  still  ask 
the  decision  maker  at  each  iteration  if  the  new  trial  solution  is  an  improvement;  at  the 
k***  iteration,  the  decision  maker  would  be  asked  to  choose  between  the  lotteries 
{xld*‘,£}  and  {xld*‘*^c}.  This  direct  comparison  of  lotteries  requires  simultaneous 


consideration  of  deterministic  tradeoffs  and  risk  attitudes.  It  is  a difficult  task  for  the 
decision  maker  to  perform,  but  cannot  be  avoided  since  improvement  must  be 
guaranteed  at  each  iteration.  As  a result  of  these  obstacles,  the  current  version  of  the 
proxy  iteration  algorithm  is  not  applicable  to  problems  in  which  uncertainty  plays  a 
major  role. 

Although  the  iterative  local  procedure  is  not  well  suited  for  decision  making  under 
uncertainty,  other  local  preference  modeling  techniques  may  be  useful.  Perhaps  the 
analyst  can  find  a "less  restrictive"  global  preference  function  by  assessing  local 
preference  models  in  various  decision  regions,  in  the  small,  and  "piecing  them  together". 
Some  interpolation  scheme  would  be  needed  to  guarantee  that  the  conglomerate  global 
function  satisfies  the  risk  preference  axioms  (2. 5-2.9).  This  research  has  motivated  other 
work  in  this  direction  (in  the  Stanford  Decision  Analysis  Research  Program)  and  initial 
results  look  promising. 


Chapter  VI 

PRACTICAL  APPLICATION  OF 
THE  PROXY  ITERATION  ALGORITHM 


6.1  The  Decision  Problem 

The  true  practica.  test  of  any  decision-making  procedure  is  a real  problem.  In  this 
chapter.  I describe  the  application  of  the  proxy  iteration  algorithm  to  a cirriculum 
design  problem  for  the  upper  grades  of  a combined  elementary  and  junior  high  school. 

The  Shepherd  School  is  a small  private  school  in  San  Jose,  California.  It  includes  a 
lower  school,  preschool  through  fourth  grade,  and  an  upper  school,  fifth  grade  through 
eighth  grade.  Mrs.  Wanda  Grenke  is  the  head  teacher,  principal,  and  administrator  at 
Shepherd  School.  She  is  currently  planning  the  curriculum  for  the  upper  school  and 
wants  to  choose  the  best  combination  of  subjects  to  teach  the  students  during  classroom 
hours.  Her  task  is  a resource  allocation  problem;  how  should  she  allocate  the  weekly 
class  hours  (9:00  am  to  3:00  pm,  Monday  through  Friday)  to  the  different  subjects: 
reading,  language  arts,  arithmetic,  social  studies,  science,  foreign  language,  art,  music,  and 
physical  education.  Mrs.  Grenke  would  like  to  emphasize  reading,  writing,  and 
arithmetic;  even  so,  she  still  may  choose  from  an  infinite  set  of  possible  allocations. 
Any  allocation  she  chooses  must  provide  for  the  following  fixed  requirements: 

1.  A fifteen  minute  recess  in  the  morning  and  a forty-five  minute 
lunch-recess  at  midday. 

2.  One  period  for  chapel  every  Monday  morning,  required  by  the 
Board  of  Trustees;  the  exact  length  of  this  period  is  not  specified,  but  Mrs. 
Grenke  indicates  that  student  concentration  seems  to  dwindle  after  one  hour. 

She  prefers  academic  periods  of  40  to  50  minutes. 

3.  Three  periods  of  physical  education  per  week  (a  previously 
established  school  policy). 

4.  One  period  per  week  for  special  projects  and  class  meetings. 
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Mrs.  Grenke  also  faces  constraints  on  her  teaching  staff.  The  Shepherd  School,  like 
many  other  private  schools,  has  a tight  operating  budget;  as  a result,  the  school  has  a 
very  limited  number  of  teaching  positions,  some  full-time  and  some  part-time.  The 
staff  members  who  teach  in  the  upper  school  are  listed  below.  For  each  teacher,  the  list 
includes  the  subjects  taught  and  the  number  of  hours  per  week  he  or  she  can  teach. 

1.  Mrs.  Grenke:  Reading,  language  arts,  and  social  studies.  Mrs. 
Grenke  is  a full-time  employee,  but  her  duties  as  principal  and  administrator 
require  several  hours  during  each  school  day.  She  can  devote  no  more  than  16 
hours  per  week  to  classroom  instruction. 

2.  Mr.  Herriman:  Mathematics,  science,  physical  education,  musical 
instruments.  Mr.  Herriman  is  a full-time  staff  member.  He  teaches  math, 
science,  and  physical  education  to  the  upper  school,  and  musical  instruments  to 
the  lower  school.  After  subtracting  time  for  the  fixed  physical  education  and 
music  requirements,  he  has  19.5  hours  remaining  during  which  he  can  teach 
math  and  science  in  the  upper  school. 

3.  Mrs.  Findlay:  Reading,  language  arts,  social  studies.  Mrs.  Findlay 
is  a part-time  staff  member  working  15  hours  per  week  in  the  upper  school. 

Her  hours  are  flexible,  but  must  occur  in  continuous  blocks  each  day  without 
free  periods  interspersed. 

4.  Mrs.  Williams:  Foreign  languages.  Mrs.  Williams  is  a part-time 
instructor  working  in  the  upper  school  3 hours  each  week.  She  is  available 
only  on  Tuesdays  and  Thursdays  midday  to  early  afternoon. 

5.  Ms.  Fyfe:  Music.  Ms.  Fyfe  is  the  full-time  kindergarten  teacher 
(lower  school)  and  the  music  teacher  for  all  grades  The  kindergarten  closes  at 
lunchtime  on  Mondays,  Wednesdays,  and  Fridays,  so  Ms.  Fyfe  is  available  these 
afternoons  to  teach  music  to  the  other  grades. 

6.  Mrs.  Blanchard:  Art.  Mrs.  Blanchard  is  the  full-time  third-fourth 
grade  teacher  and  the  art  teacher  for  the  upper  school.  She  is  available  to 
teach  art  in  the  upper  school  on  Monday.  Wednesday,  and  Friday  afternoons 
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when  Ms.  Fyfe  instructs  her  combination  third-fourth  grade. 

No  extra  funds  are  available  for  increasing  the  staff  or  the  number  of  hours  of  the 
part-time  instructors.  Mrs.  Grenke  must  design  the  curriculum  given  the  teaching  staff 
listed  above. 

The  upper  school  is  divided  into  a fifth-sixth  grade  combination  and  a 
seventh-eighth  grade  combination.  These  combined  classes  are  well  suited  for  language 
arts,  social  studies,  science,  foreign  language,  art,  and  music  since  the  skills  required  for 
these  subjects  are  not  grade  specific.  For  reading  and  arithmetic,  however,  the 
combination  classes  do  not  work  well.  The  Shepherd  School  uses  the  Lippincott  Reading 
Program  and  Field  Mathematics  Program;  these  programs  are  nationally  recognized  series 
of  texts  for  elementary  and  junior  high  school  instruction.  The  Lippincott  and  Field 
texts  are  specifically  geared  for  the  individual  grade  levels  and  cannot  be  mixed 
effectively  for  the  combined  classes.  The  Field  series  comprises  Shepherd's  entire 
mathematics  program;  the  Lippincott  series  comprises  about  half  of  Shepherd's  reading 
program.  As  a result,  all  mathematics  and  at  least  half  the  reading  classes  must  be  taught 
separately  to  the  fifth,  sixth,  seventh,  and  eighth  grades. 

Mrs.  Grenke  believes  the  Shepherd  School's  most  important  academic  function  is 
the  development  of  its  students'  verbal  and  quantitative  skills.  Consequently,  she  wants 
to  emphasize  the  reading  and  arithmetic  programs,  but  at  the  same  time,  she  wants  to 
provide  a well-balanced  education.  Just  how  much  of  each  subject  should  she  include  in 
the  upper  school  curriculum?  In  section  6.2,  1 construct  a model  of  her  decision 
problem  and  in  section  6.3,  I apply  the  proxy  iteration  algorithm  to  help  her  find  the 
solution. 

6.2  Modeling  the  Decision  Problem 

The  success  or  failure  of  our  decision-making  procedure  (and  the  global  procedure 
as  well)  depends  on  the  decision  maker's  ability  to  provide  tradeoffs  that  adequately 
reflect  his  or  her  underlying  preferences.  The  decision  maker  can  respond  to  the 
assessment  questions  consistently  only  if  the  analyst  models  the  outcomes  with  objective. 
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well-defined,  and  unambiguous  attributes.  The  attributes  must  be  concrete  enough  so 
the  decision  maker  can  easily  visualize  any  particular  outcome  vector,  can  clearly 
distinguish  one  outcome  from  another,  and  can  confidently  express  preferences  among 
outcomes. 

In  the  Shepherd  School  problem,  we  must  select  concrete  attributes  that  provide  an 
accurate  and  complete  description  of  the  upper  school  curriculum.  Mrs.  Grenke  has 
been  a teacher  for  many  years;  she  has  prepared  and  taught  thousands  of  lesson  plans, 
mostly  for  forty-five  minute  class  periods.  She  can  clearly  visualize  a curriculum 
described  by  the  number  of  weekly  class  periods  of  each  subject.  Furthermore,  she  is 
willing  to  choose  between  any  two  curricula  by  comparing  the  amounts  of  the  various 
subjects.  To  model  her  problem,  I divide  each  school  day  into  six  class  periods,  each 
lasting  45  minutes.  I allow  three  minutes  between  classes,  a fifteen  minute  recess 
between  periods  two  and  three,  and  a forty-five  minute  lunch-recess  between  periods 
four  and  five  (in  accordance  with  the  fixed  requirements  enumerated  in  Section  6.1). 
These  fixed  requirements  also  consume  five  class  periods  each  week;  one  for  chapel, 
three  for  physical  education,  and  one  for  special  topics  and  class  projects.  An  additional 
fixed  requirement  arises  from  the  limited  part-time  hours  of  Mrs.  Williams,  the  foreign 
language  instructor.  Mrs.  Williams  can  spend  only  two  periods  with  the  fifth-sixth 
grade  and  two  periods  with  the  seventh-eighth  grade  each  week.  She  will  not  work  less 
than  three  hours  (four  periods)  per  week  since  the  resulting  compensation  would  make 
the  arrangement  uneconomical.  Mrs.  Grenke  definitely  wants  to  include  French  in  the 
curriculum,  so  she  must  allocate  exactly  two  periods  for  it  each  week.  These  fixed 
requirements  for  chapel,  physical  education,  class  projects,  and  foreign  language  use  up 
seven  of  the  thirty  class  periods  each  week.  Twenty-three  periods  remain,  to  be  divided 
among  reading,  language  arts,  arithmetic,  social  studies,  science,  art  and  music.  For  the 
purpose  of  curriculum  planning,  Mrs.  Grenke  groups  art  and  music  instruction  together 
as  a combined  program.  Consequently,  I combine  them  as  one  subject  in  the  decision 
model  and  represent  the  upper  school  curriculum  (grades  five  through  eight)  by  the 
following  six  attributes: 
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Xj  = number  of  45  minute  periods  of  reading  per  week. 

Xj  = number  of  45  minute  periods  of  language  arts  per  week. 

X3  = number  of  45  minute  periods  of  arithmetic  per  week. 

X4  = number  of  45  minute  periods  of  social  studies  per  week. 

Xj  = number  of  45  minute  periods  of  science  per  week 

x^  = number  of  45  minute  periods  of  art/music  each  week. 

These  six  attributes,  together  with  the  fixed  requirements,  completely  specify  the 
weekly  curriculum. 

Modeling  the  Constraints 

The  fixed  requirements  consume  7 of  the  30  weekly  classroom  periods,  leaving  23  to 
be  allocated  to  these  6 subjects.  This  time  constraint  can  be  represented  quantitatively  in 
terms  of  the  outcome  attributes: 

Xi  + X2  + Xj  + X4  + Xj  + xg  < 23. 

The  constraint  is  written  as  an  inequality  since  study  halls  could  fill  any  slack. 

The  limitations  on  the  teaching  staff  further  constrain  the  curriculum  since  certain 
subjects  are  taught  oniy  by  certain  teachers.  Mrs.  Grenke  and  Mrs.  Findlay  teach  all 
reading,  language  arts,  and  social  studies  classes  in  the  upper  school.  Their  combined 
teaching  hours  are  equivalent  to  38  forty-five  minute  periods  per  week.  They  can  teach 
language  arts,  social  studies,  and  half  the  reading  program  to  the  fifth-sixth  and 
seventh-eighth  grade  combination  classes.  The  other  half  of  the  reading  program,  keyed 
to  the  Lippincott  readers,  requires  instruction  at  four  separate  levels.  The  following 
inequality  represents  this  teaching  arrangement: 

3xj  + 2X3  + 2x,,  < 38. 

It  implies  that  the  total  number  of  reading,  language  arts,  and  social  studies  classes  (at  all 
levels)  cannot  exceed  Mrs.  Grenke's  and  Mrs.  Findlay's  38  teaching  periods  per  week. 
The  coefficients  indicate  the  number  of  levels  at  which  each  subject  is  taught  vthc 
coefficient  for  reading  is  the  average  number  of  levels). 
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Mr.  Herriman  handles  Ihe  enlire  math  and  science  program  of  the  upper  school. 
The  19.5  hours  he  has  available  for  these  two  subjects  are  equivalent  to  24.5  forty-five 
minute  periods  (allowing  for  three  minutes  between  classes).  Science  lessons  are  Uught 
to  the  fifth-sixth  and  se'  enth-eighth  grade  combined  classes,  but  the  Field  Mathematics 
Program  requires  four  individual  levels.  The  following  inequality  constraint  indicates 
that  the  sum  of  all  math  and  science  periods  cannot  exceed  Mr,  Herriman’s  24.5  available 
teaching  periods: 

4xj  ♦ 2xj  < 24J. 

The  fixed  requirements,  for  which  no  tradeoffs  are  possible,  used  up  5J  of  his  30  weekly 
periods. 

Mrs.  Blanchard  and  Ms.  Fyfe.  the  art  and  music  instructors,  can  teach  in  the  upper 
school  during  the  two  afternoon  periods  three  days  a week.  However,  only  one  of  these 
teachers  can  teach  in  the  upper  school  at  a time  since  one  must  take  the  third-fourth 
grade  when  the  other  goes  to  the  upper  school.  Together  they  can  teach  the  fifth-sixth 
and  seventh-eighth  combined  grades  six  times  a week.  The  following  inequality 
indicates  that  the  total  number  of  upper  school  art  and  music  classes  cannot  exceed  their 
6 available  periods  per  week: 

2xj  < 6. 

By  definition,  a nonnegativity  constraint  holds  for  each  attribute: 

* > Q- 

Mrs.  Grenke  indicated  that  she  could  eliminate  from  consideration  many  allocations 
that  were  unquestionably  inferior.  Without  hesitation,  she  eliminated  all  curricula  that 
failed  to  meet  the  following  specifications: 

A.  Reading,  language  arts,  and  arithmetic  at  least  three  times  a week. 

but  not  more  than  twice  a day 

B.  Social  studies  at  least  once  a week,  but  not  more  than  once  a day 

C.  Science  not  more  than  once  a day 

D.  Art  or  music  at  least  once  a week,  but  not  more  than  once  a day 

81 


C 


Chapter  6;  Practical  Application 


Writing  these  specificstions  as  lower  and  upper  bounds  on  the  subject  attributes,  we 
have  the  following  additional  constraints: 

3 < x-  < 10.  j = 1.2,3;  1 < < 5.  k = 4,6:  0 < Xj  < 5 

Putting  all  the  constraints  on  class  time  and  teaching  staff  together  with  the  bounds 
on  the  attributes,  we  define  their  intersection  as  the  decision  set  X(D): 

X(D)  is  the  set  of  all  x such  that 


+ 

*2 

-*• 

Xj  + X4 

+ 

Xj  + 

*6 

< 23 

>< 

2X2 

+ 

2X4 

< 38 

4X3  + 

2X5 

< 24.5 

3 < 

>^1 

< 

10. 

3 < Xj 

< 

10. 

3 

0 

VI 

X 

VI 

1 < 

*4 

< 

5. 

0 < X5 

< 

5. 

1 

VI 

>o 

X 

VI 

Modeling  the  objective 

Mrs.  Grenkes  preferences  among  the  possible  curricula  are  transitive  and  she  is 
willing  to  make  tradeoffs  among  the  six  attributes.  These  tradeoffs  are  well-defined 
since  each  attribute  is  continuous;  any  class  period  can  be  divided  into  fractional  parts. 
At  any  x € X(D),  she  prefers  more  of  each  attribute  to  less,  and  her  preferences  satisfy 
Axiom  2.4  (decreasing  marginal  rates  of  substitution).  Thus,  her  preference  ordering 
over  these  six  attributes  restricted  to  X(D)  satisfies  the  deterministic  preference  axioms 
(2. 1-2.4)  and  can  therefore  be  represented  by  a concave  deterministic  preference 
function  V(x). 

Mrs.  Grenke's  decision  problem  is: 

In  the  next  section.  1 use  the  proxy  iteration  algorithm  to  solve  this  multi-attribute 
problem. 

Four  of  the  six  teachers  have  constraints  not  only  on  the  total  number  of  hours 
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they  can  teach  in  the  upper  school,  but  also  on  specific  hours  certain  days  of  the  week. 
Keeping  track  of  the  individual  hours  of  each  teacher  in  the  decision  model  would  be  a 
cumbersome  task  and  the  resulting  optimization  algorithm  would  be  horribly  complex. 
Instead,  I first  solve  the  allocation  problem  for  x € X(D),  and  then  verify  that  the 
solution  meets  the  specific  hourly  requirements  of  each  teacher.  In  section  6.4,  I develop 
a class  schedule  for  the  upper  school  that  implements  the  optimal  curriculum  and 
satisfies  all  the  time  and  staff  constraints. 

6.3  Applying  the  Proxy  Algorithm 
Selecting  the  proxy 

Before  beginning  the  iterative  procedure,  we  must  select  the  form  of  the  proxy 
function  that  we  will  use  at  each  iteration.  We  may  choose  either  the 
sum-of-exponentials,  sum-of-powers,  or  Cobb-Douglass  functions,  or  even  a 
heterogeneous  combination  for  different  attributes.  We  use  the  global  modeling 
procedure  of  Keelin  [20]  and  Barrager  [3]  to  select  the  functional  form  that  best 
represents  the  decision  maker’s  preferences.  Our  proxy,  therefore,  has  the  same  form  as 
their  global  preference  function.  Using  this  global  procedure,  we  assume  deterministic 
additivity,  assess  tradeoffs,  and  use  the  estimating  formula  (B.l)  to  approximate  the 
marginal  value  reduction  coefficients  Zj,  z^,  ...  , z^  (see  Appendix  B).  After  assessing 
z,  at  a number  of  points,  we  must  select  one  of  the  following  three  models  of  Zj(Xj): 


Zj(Xj)  = Wj 

=> 

Vj(Xj)  = -aje""'*' 

Z,(Xj)  = (1  + Oj)  / Xj 

=> 

Vi(Xi)  = -a,xf“i 

Zj(Xj)  = 1/Xj 

><” 

C 

II 

>“ 

Even  if  none  of  the  models  fits  the  data,  we  must  still  choose  one  of  them  since  the 
global  procedure  can  handle  only  these  preference  forms.  Herein  lies  a major 
disadvantage  of  global  preference  modeling;  the  global  procedure  forces  the  preferences 
to  fit  the  function  rather  than  modeling  the  function  to  fit  the  preferences. 
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For  each  attribute  x,.  we  must  choose  the  best  model  of  ZjfXj).  We  illustrate  the 
procedure  for  Zjfx^),  using  Xj  as  the  price  variable. 

A.  Choose  a small  increment  Ax2  for  the  tradeoff  assessment  questions.  Ideally, 
Axj  should  be  the  smallest  increment  over  which  the  decision  maker  can  express 
meaningful  preferences.  This  optimal  increment  size  is  determined  experimentally; 
initially  we  begin  with  AXj  = 1. 

A B C D A 

B.  Choose  four  ordered  pairs  (Xj.Xj)  , (Xj.Xj)  , (Xj.Xj)  , (Xj.Xj)  where  Xj 

and  Xj^  are  nominal  levels  of  Xj  and  X2.  Xj^  = Xj®  = Xj^  = Xj*^,  and  Xj^  = X2''  + 

A c B . DC. 

AX2,  X2  = X2  + AX2,  X2  = X2  + AX2. 

C.  Assess  A2(’‘i'’'2)  3t  pairs  A,  B,  C,  and  D by  finding  the  Axj  at  which 

(x,  - Axj.  X2  + AX2)  ~ (Xj,  X2). 

D.  Estimate  using  equation  (B.l): 

22(^2)''®  ~ (1/^X2)  In  {[X2(Xi.X2)'^]  / [A2(Xi.X2)®}. 

BC  CD 

Estimate  Z2(X2)  and  Z2(x2)  by  the  same  technique. 

To  estimate  Z2.  we  use  an  increment  of  one  language  arts  class  period  at  the 
following  (Xj,X2)  pairs:  (7,3),  (7,4),  (7,5),  and  (7,6).  Table  6.1  shows  the  tradeoff 
assessments  and  the  estimates  of  Z2.  We  observe  that  Z2(3)  = 0.51,  Z2(4)  = 0.41,  and 
Z2(5)  = 0.69;  Z2  is  neither  a constant,  increasing,  or  decreasing  function  of  X2. 
However,  the  model  in  which  Zj  is  a constant  fits  the  data  better  than  those  in  which 
Z|  is  decreasing.  Therefore  we  use  the  exponential  form  -a2e  for  attribute  X2  in 

our  proxy  function. 

The  same  technique  was  used  to  estimate  Zj.  Table  6.1  shows  Z3(4)  = 0.69,  Zj(5)  = 
0.41,  and  Z3(6)  = 0.69.  Again,  none  of  the  models  provides  a very  good  fit,  but  the 
constant  parameter  form  is  the  best  of  the  three. 

For  Z4  and  Zj,  I asked  the  tradeoff  questions  using  Ax^  = 0.5  and  Axj  = 0.5. 
The  nominal  levels  of  the  attribute  pairs  were  also  changed  to  sample  different  regions 
of  X(D).  Mrs.  Grenke  had  no  problem  responding  to  the  questions  with  the  smaller 
increments.  Table  6.1  shows  Z4(2.5)  = 0.67,  z^fl.O)  = 1.02,  Z4(3.5)  = 0.81;  25(2.0)  = 


Table  6.1  Assessments  for  Selecting  the  Proxy  Function 


Tradeoff  Assessment 

Estimate  of  Zi(Xj) 

^2 

AXj  = 10 

X2(7.0.3.0)  = 1.0 
A2(7.0.4.0)  = 0.6 
A2(7.0.5.0)  = 0.4 
A2(7.0,6.0)  = 0.2 

23(3.0)  = 0.51 
23(4.0)  = 0.41 
23(5.0)  = 0.69 

h 

Axj  = 1.0 

A3(7.0.4.0)  = 1.2 
A3(7.0.5.0)  = 0.6 
Aj(7.0.6.0)  = 0.4 
A3(7.0,7.0)  = 0.2 

23(4.0)  = 0.69 
23(5.0)  = 0.41 
23(5.0)  = 0 69 

Z4 

AX4  = 0.5 

A4(6.0.2.5)  = 0.7 
A4(6.0.3.0)  = 0.5 
A4(6.0.3.5)  = 0.3 
A4(6.0.4.0)  = 0.2 

24(2.5)  = 0.67 
24(3.0)  = 1.02 
24(3.5)  = 0.81 

^5 

AXj  = 0.5 

A5{8.0,2.0)  = 0.7 
A5(8.0.2.5)  = 0.5 
A5(8.0.3.0)  = 0.4 
A5(8.0.3.5)  = 0.3 

25(2.0)  = 0.68 
25(2.5)  = 0.45 
25(3.0)  = 0.58 

^6 

Ax^  = 0.2 

A6(8.0,1.6)  = 1.0 
A6(8.0,1.8)  = 1.0 
A^(8.0,2.0)  = 0.8 
A6(8.0.2.2)  = 0.8 

26(1.6)  = 0.00 
26(1.8)  = 1.12 
26(2.0)  = 0.00 

Axj  = 0.5 

A6(8.0.1.5)  = 1.3 
A(,(8.0,2.0)  = 0.9 
A^(8.0.2.5)  = 0.6 
A6(8.0,3.0)  = 0.4 

26(1.5)  = 0.74 
26(2.0)  = 0.81 
26(2.5)  = 0.81 

Axj  = 0.5 

A,(5.0.6.0)  = 1.3 
A,(5.5.6.0)  = 1.0 
Ai(6.0,6.0)  = 0.7 
Ai(6.5.6.0)  = 0.5 

2, (3.0)  = 0.52 
2, (4.0)  = 0.71 
2i(5.0)  = 0.67 
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0.68,  Z5(2.5)  = 0.45,  and  25(3.0)  = 0.58.  These  estimates  indicate  and  Zj  are 
certainly  not  decreasing  functions  of  and  x,,  respectively.  Forced  to  choose 
between  the  constant  or  decreasing  models,  we  select  the  constant  parameter  in  each 
case  and  use  the  exponential  forms  -a^e  and  -aje  for  attributes  x^  and 

X5  in  the  proxy  function. 

For  attribute  x^,  I reduced  the  tradeoff  assessment  increment  to  Ax^  = 0.2,  but 
the  decision  maker  responded  to  several  questions  inconsistently.  Table  6.1  shows 
X6(8.0,1.6)  = A^fg.O.l.S)  and  Xg(8.0,2.0)  = X^(8.0,2.2).  These  tradeoffs  yield  z^d.b)  = 
0,  Zg(1.8)  = 1.12,  and  z^(2.0)  = 0,  implying  regions  of  constant  marginal  rates  of 

substitution.  These  regions  of  linear  indifference  curves  violate  Axiom  2.4  (decreasing 

marginal  rates  of  substitution).  Apparently,  Mrs.  Grenke  had  trouble  distinguishing 
between  pairs  of  attributes  whose  corresponding  components  differed  by  less  than  10 
minutes. 

When  the  assessment  procedure  was  repeated  using  Ax^  = 0.5,  Mrs.  Grenke's 
responses  obeyed  Axiom  2.4.  Table  6.1  shows  the  new  tradeoffs  and  the  resulting  values 
of  z^(x^):  zjl.5)  = 0.74,  z^(2.0)  = 0.81,  and  Z(,(2.5)  = 0.81.  The  constant  Zj  model 

gives  a good  fit  for  these  data  points. 

Finally,  we  model  Zj  using  Xj  as  the  price  variable  and  Axj  = 0.5  as  the 
assessment  increment.  Table  6.1  shows  the  tradeoff  assessments  and  the  resulting  values 
of  Zj(Xj):  Zj(5.0)  = 0.52,  Zj(5.5)  = 0.71,  and  Zj(6.0)  = 0.67.  We  again  find  that  none 

of  the  models  is  appropriate,  but  the  constant  Zj  form  provides  the  best  fit.  None  of 
the  Z|  parameters  show  a clearly  decreasing  pattern,  so  we  use  the  constant  Zj  model 
for  each  attribute.  The  resulting  proxy  is  the  sum-of-exponentials  function: 

P(x)  = -2i 

All  six  attributes  are  similar  entities  measured  along  a common  dimension.  It  is 
therefore  not  surprising  that  the  tradeoff  structure  is  similar  for  each  attribute. 

Beginning  the  Iterative  Procedure 

Having  selected  the  form  of  the  proxy  function,  we  may  now  begin  the  first 
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iteration.  We  drop  the  deterministic  additivity  assumption,  so  we  must  specify  all  six 
attributes  when  assessing  tradeoffs  between  any  pair. 

Mrs.  Grenke  wants  to  emphasize  reading  and  arithmetic.  She  knows  the  students 
benefit  from  the  individualized  instruction  in  these  two  subjects,  but  wonders  whether 
one  reading  and  one  math  period  per  day  is  sufficient.  We  use  this  information  to 
choose  the  two  initial  points: 

= (8,  5,  6,  2,  0.5,  1.5) 
x2  = (4.5,  4,  4.5,  4,  3.  3); 

x^  gives  a strong  emphasis  to  reading  and  math,  while  x^  gives  them  only  a slight 
emphasis  relative  to  the  other  subjects. 

We  want  the  tradeoff  assessments  to  represent  the  true  gradient  as  accurately  as 
possible.  To  help  Mrs.  Grenke  conceptualize  different  outcome  vectors,  I asked  each 
teacher  to  provide  typical  lesson  plans  for  forty-Tive  minute  periods  in  his  or  her 
subjects.  Mrs.  Grenke  could  refer  to  these  characteristic  lesson  plans  to  help  visualize 
fractional  parts  of  periods.  In  the  assessment  procedure,  the  price  variable  is  Xj  and 
the  assessment  increment  is  one-half  period.  Both  the  single  point  and  successive  point 
consistency  tests  are  used. 

Figure  6.1a  shows  the  tradeoffs  assessed  at  x*-  To  check  consistency,  a second  set 
of  tradeoffs  was  assessed  at  x'>  ^2  price  variable.  1 set  an  allowable 

tolerance  of  25%;  if  the  discrepancy  for  any  tradeoff  exceeds  25%, 

I I ^ ^ 

it  must  be  resolved.  These  errors  are  resolved  by  explaining  the  inconsistency  to  the 
decision  maker,  showing  her  the  direction  in  which  the  violating  component  must 
change,  and  reassessing  the  tradeoffs  until  the  tolerance  condition  is  satisfied.  In  cases 
where  Ajj  is  very  small,  0.3  or  less,  a larger  tolerance  of  35%  is  allowed. 

Figures  6.1a  and  6.1b  show  the  tradeoff  assessments  and  consistency  tests  at  x’ 
and  x^.  At  both  points,  the  discrepancies  exceeded  the  20%  tolerance,  so  several 
reassessments  were  required.  The  attribute  Xj  was  used  as  the  price  variable  at  x^. 
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x':  (8.0,  5.0,  6.0,  2.0,  0.5,  1.5) 


Tradeoff  assessments  at  x',  with  price  variable  Xj:  (1.0,  1.0,  0.6,  1.6,  2.4,  1.4) 

Single  point  consistency  check,  with  price  variable  Xj 

Aj/x*):  (0.8,  1.0,  0.8,  1.8,  2.2,  1.0) 

Percent  error  by  tradeoff: 

20.0  0.0  33.3  12.5  8.3  28.6 

Error  exceeds  tolerance:  Reassess  violating  components 

Ai6(x^)  ^ 1.3  0-6  ^ 1-2 

Percent  error  by  tradeoff: 

20.0  0.0  0.0  12.5  8.3  7.7 
These  tradeoffs  will  be  used. 

Figure  6.1a  First  Iteration 
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xh  (4.5,  4.0,  4.5,  4.0,  3.0,  3.0) 

Tradeoff  assessments  at  x^,  with  price  variable  Xj:  (1.0,  0.4,  1.0,  0.2,  0.2,  0.2) 

Single  point  consistency  check,  with  price  variable  Xj 
A3j(x^):  (0.7,  0.2,  1.0,  0.3,  0.2,  0.2) 

Percent  error  by  tradeoff: 

30.0  50.0  0.0  50.0  0.0  0.0 

Error  exceeds  tolerance:  Reassess  violating  components 

Ai2(x^)  ^ 0.5  Aij(x2)  0.9  Ai4(x2)  0.3  A3i(x2)  1.0  <-  0.6  A35(x^)  0.3 

Percent  error  by  tradeoff: 

10.0  8.0  0.0  10.0  35.0  10.0 
These  tradeoffs  will  be  used. 

Nearby  point  x^:  (6.0,  4.0,  4.5,  4.0,  3.0,  3.0) 

Assess  single  tradeoff  at  x^:  Aj2(x^ ) = 0.7 

Fit  proxy  and  perform  successive  point  consistency  test: 

Proxy  is  consistent  with  DMRS 

Maximize  proxy  with  convex  programming  algorithm: 

New  maximum  x^  is:  7.58  3.97  5.24  2.84  1.65  1.73 

Ask  decision  maker:  x^  > x^  ? 

Yes 

Iteration  improved  objective.  Use  new  maximum  in  next  iteration. 


{ 

Figure  6.1b  First  iteration 
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After  the  single  point  inconsistencies  were  resolved,  a single  tradeoff  X2  at  the  nearby 
point  = (6,  4.  4.5,  4,  3,  3)  was  assessed  and  checked.  The  successive  point 
consistency  check  was  then  used;  if  the  parameters  a and  u are  positive,  the  proxy  is 
consistent  with  Axiom  2.4  (decreasing  marginal  rates  of  substitution).  Figure  6.1 
indicates  the  proxy  passed  this  consistency  test 

The  convex  programming  algorithm  maximizes  the  proxy  over  X(D).  Figure  6.1b 
shows  the  new  trial  solution; 

x^  = (7.58,  3.97.  5.24,  2.84,  1.65,  1.73) 

Mrs.  Grenke  preferred  x^  to  x^,  so  the  new  maximum  was  used  to  begin  the  next 
iteration. 

Figure  6.2  shows  the  tradeoff  assessments  at  x^  and  the  consistency  tests.  The 
discrepancies  again  exceeded  the  allowed  tolerance  and  required  further  resolution.  The 
proxy  fit  from  these  revised  tradeoffs  passed  the  successive  point  consistency  tesL  This 
new  proxy  generated  the  next  trial  solution  x^: 

x'*  = (6.87,  3.99,  5.14,  3.00,  1.99,  2.01). 

The  time  constraint  and  the  constraint  on  Mr.  Herriman  are  active  at  this  point. 

When  asked  to  compare  x"*  with  x^  Grenke  indicated  without  hesitation 

that  x^  was  preferred.  This  new  maximum  was  used  to  begin  the  third  iteration. 

The  tradeoff  assessments  for  this  iteration  are  shown  in  Figure  6.3.  Only  one 
component  exceeded  the  tolerance  of  the  consistency  test.  After  resolving  the 
inconsistency,  an  additional  Aj  was  assessed  and  checked  at  x^  = (6,  4,  5.13,  3.00,  1.99, 
2.01).  The  proxy  fit  from  these  assessments  at  x^  consistent  with 

Axiom  2.4.  Figure  6.3  shows  the  new  trial  solution  x^  generated  by  the  third  iteration; 

x*  = (6.66,  3.99,  5.16,  2.84,  1.93,  2.42). 

Attributes  Xj,  x^,  and  Xj  changed  very  little  from  iteration  two  to  iteration  three. 
Mrs.  Grenke  preferred  x*  to  x^  because  the  increase  in  the  art/music  component  more 
than  compensated  for  the  decrease  in  reading  and  social  studies.  Since  the  new  trial 
solution  was  an  improvement,  the  next  iteration  could  begin. 
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x^:  (7.58,  3.97,  5.24,  2.84,  1.65,  1.73) 


Tradeoff  assessments  at  with  price  variable  Xj-  (1.0,  1.6,  1.2,  1.4,  1.8,  1.8) 

Single  point  consistency  check,  with  price  variable  x^ 

A^/x’):  (0.6.  0.8,  1.0,  1.0,  1.0,  1.4) 

Percent  error  by  tradeoff: 

16.0  30.0  16.7  0.0  22.2  8.9 

Error  exceeds  tolerance:  Reassess  violating  components 

A,j(x’)  - 1.5  A,3(x^)  ^ 1.1  Aj^fx’)  - 1.3  A,5(x^)  ^ 1.7  Aj<,(x5)  ^ 1.6 

A4i(x5)  ^ 0.7  A42(x^)  - 1.0  A43(x3)  ^ 0.9  A^jfx^)  <-  1.2 

Percent  error  by  tradeoff: 

9.0  13.3  6.4  0.0  8.2  13.7 
These  tradeoffs  will  be  used. 

Fit  proxy  and  perform  successive  point  consistency  tesL 
Proxy  is  consistent  with  DMRS 

Maximize  proxy  with  convex  programming  algorithm: 

New  maximum  x^  is:  6.87  3.99  5.14  3.00  1.99  2.01 

Ask  decision  maker:  x^  > x^  ? 

Yes 

Iteration  improved  objective.  Use  new  maximum  in  next  iteration. 

Figure  6.2  Second  Iteration 
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x^:  (6.87,  3.99,  5.14,  3.00,  1.99,  2.01) 


Tradeoff  assessmenu  at  x^  with  price  variable  Xj:  (1.0,  1.2,  1.2,  0.8,  1.0,  1.2) 

Single  point  consistency  check,  with  price  variable  Xj 

X5.(x''):  (1.2,  1.0,  1.0,  1.2,  1.0,  1.2) 

Percent  error  by  tradeoff: 

20.0  16.7  16.7  50.0  0.0  0.0 

Error  exceeds  tolerance:  Reassess  violating  components 

X,2(x'')  ^ 1.1  Xi3(x'‘)  *-  1.1  Xj4(x'‘)  0.9  X5,(x'’)  ^ 1.1  X54(x^)  1.1 

Percent  error  by  tradeoff: 

10.0  9.1  9.1  22.2  0.0  0.0 
These  tradeoffs  will  be  used. 

Nearby  point  x'*':  (600.  4.00,  5.14,  3.00,  1.99,  2.01) 

Assess  single  tradeoff  at  x^:  Xj2(x^)  = 0.8 

Fit  proxy  and  perform  successive  point  consistency  test: 

Proxy  is  consistent  with  DMRS 

Maximize  proxy  with  convex  programming  algorithm: 

New  maximum  x*  is:  6.66  3.99  5.16  2.84  1.93  2.42 

Ask  decision  maker:  x*  > x^  ? 

Yes 

Iteration  improved  objective.  Use  new  maximum  in  next  iteration. 

Figure  6.3  Third  Iteration 
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Figure  6.4  shows  the  tradeoff  assessments  at  x*.  For  the  first  time,  the  discrepancy 
with  the  additional  price  variable  was  sufficiently  small.  However,  the  successive  point 
consistency  test  was  violated.  Figure  6.4  shows  several  proxy  parameters  were  negative, 
indicating  increasing  marginal  rates  of  substitution.  The  inherent  assessment  error  was 
too  large  relative  to  the  distance  between  x"*  and  x^-  These  two  points  were  too  close 
for  Mrs.  Grenke  to  provide  mathematically  consistent  responses.  Consequently,  we  could 
not  fit  a sum-of-exponentials  proxy  at  x^  and  x*  that  obeyed  Axioms  2.1 -2.4. 
Instead,  we  used  the  tradeoff  vector  at  x^  alone  (which  passed  the  single  point 
consistency  test)  to  determine  the  linear  approximation  of  the  indifference  curve.  With 
this  linear  proxy,  we  took  a modified  Frank-Wolfe  spacer  step.  The  new  maximum  x^ 
found  by  a linear  programming  algorithm,  was  the  extreme  point; 

= (0,  19,  4,  0,  0,  0). 

This  new  maximum  was  obviously  inferior,  so  we  used  it  only  to  indicate  the  direction 
of  search;  the  Armijo  relaxation  procedure,  with  a = 0.8,  yielded 

x^’  = (5.33,  6.49,  4.93,  2.27,  1.54,  1.94). 

Mrs.  Grenke  preferred  x*  to  x^ , so  another  relaxation  step  was  taken,  yielding 

= (6.39,  4.59,  5.11,  2.73,  1.85,  2.32). 

Mrs.  Grenke  still  preferred  x*  to  x^  , so  a third  Armijo  step  was  tried,  yielding 
x*’"  = (6.61,  4.11,  5.15,  2.82,  1.91,  2.40). 

Mrs.  Grenke  could  barely  distinguish  x*  from  x^  the  largest  difference  in  any 
attribute  was  just  over  five  minutes.  Since  the  Frank-Wolfe  step  specifies  the  best  local 
direction  of  improvement,  and  since  no  distinguishable  improvement  was  found  in  this 
direction,  we  terminated  the  procedure,  declaring  x^  as  the  optimal  solution. 

6.4  Iniplcnicnting  the  Optimal  Solution 

For  simplicity,  several  specific  hourly  constraints  on  teachers  and  subjects  were  not 
included  in  the  decision  model.  Having  found  the  optimal  allocation,  I tried  to  design  a 
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X*:  (6.66,  3.99,  5.16,  2.84,  1.93,  2.42) 


Tradeoff  assessments  at  with  price  variable  Xj:  (1.0,  1.2,  1.0,  0.8,  1.0,  1.0) 


Single  point  consistency  check,  with  price  variable  x^ 

10.  1-2.  1.0,  0.8,  1.0) 

Percent  error  by  tradeoff: 


0.0  16.7  20.0  25.0  20.0  0.0 


These  tradeoffs  will  be  used. 


Fit  proxy  and  perform  successive  point  consistency  test; 

Proxy  violates  DMRS:  parameters  aj,  a^  a^,  wj.  W3.  are  negative 
Points  x^  and  x^  too  close 


Fit  linear  proxy  at  x^  and  take  spacer  step 
Maximize  proxy  with  linear  programming  algorithm 
New  maximum  x*  is:  0.0  19.0  4.0  0.0  0.0  0.0 


Ask  decision  maker:  x*  > x*  ? No 

Use  relaxation  procedure:  x® : 5.33  6.49  4.93  2.27  1.54  1.94 

Ask  decision  maker  if  x*  > x^  ? No 

Use  relaxation  procedure:  x^  : 6.39  4.59  5.11  2.73  1.85  2.32 

Ask  decision  maker  if  x^  > x^  ? No 

Use  relaxation  procedure:  x”  : 6.61  4.11  5.15  2.82  1.91  2.40 

Ask  decision  maker  if  x*  > x*  ? Indifferent 

Optimal  solution:  x* 


Figure  6.4  Fourth  Iteration 
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class  schedule  that  implemented  the  optimal  curriculum  and  met  all  the  additional 
requirements  listed  below: 

A.  Chapel  must  be  the  first  period  on  Monday. 

B.  Physical  education  can  only  be  the  last  period  of  the  day. 

C.  Art  and  music  can  only  be  taught  during  periods  five  and  six  on 
Monday,  Wednesday,  and  Friday  afternoons. 

D.  Foreign  language  classes  must  be  taught  after  11:30  on  Tuesdays  and 
Thursdays. 

E.  Mrs.  Findlay's  teaching  periods  must  occur  in  continuous  blocks, 
without  interspersed  free  periods  (except  lunch). 

F.  All  students  must  be  supervised  during  the  entire  day.  Therefore, 
when  the  combination  classes  are  separated,  one  grade  must  have  reading  while 
the  other  has  math. 

G.  Lunch  period  should  begin  at  noon  or  shortly  thereafter  (a  lower 
school  requirement  imposed  on  the  upper  school). 

H.  Five  to  ten  minutes  are  needed  at  the  beginning  of  each  school  day 
for  a homeroom  period. 

After  a considerable  amount  of  juggling  teachers,  subjects,  and  hours,  I devised  an 
actual  class  schedule  implementing  the  optimal  solution.  Figure  6.5  shows  this  optimal 
curriculum.  The  time  constraint  and  the  constraints  on  Mrs.  Grenke,  Mrs.  Findlay,  and 
Mr.  Herriman  are  active,  implying  all  their  teaching  hours  arc  utilized.  The  morning 
periods  were  shortened  slightly  to  forty-two  minutes  to  provide  an  earlier  lunch  break 
than  would  have  been  possible  with  six  uniform  forty-five  minute  periods.  The  two 
afternoon  periods  were  lengthened  to  fifty-four  minutes,  filling  the  rest  of  the  day.  The 
longer  afternoon  periods  allow  for  the  optimal  arl/music  allocation  without  split 
periods.  Only  two  class  periods  per  week  are  split  between  two  subjects,  but  the  same 
instructor  teaches  both  subjects  to  the  same  group,  so  no  additional  class  breaks  are 
required.  The  weekly  class  schedule  in  Figure  6.5  is  equivalent  to  the  following 
allocation  expressed  in  forty-five  minute  periods: 
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Fifth-Sixth  Grade 


1 

2 

3 

4 

5 

6 

Monday 

Chapel 

5M  6R 

LA 

SS 

5R  6M 

A/M 

Tuesday 

5R  6M 

5M  6R 

LA 

FL 

Sci 

PE 

Wednesday 

5R  6M 

5M  6R 

LA 

SS 

5-6  R 

PE 

Thursday 

5R  6M 

5M  6R 

LA 

FL 

Sci  (32m.) 
M,^(22m.) 

PE 

Friday 

5R  6M 

5M  6R 

l^6(30m.) 

LA(12ni) 

SS 

Spec. 

Proj. 

A/M 

Seventh-Eighth  Grade 


1 

2 

3 

4 

5 

6 

Monday 

Chapel 

SS 

7M  8R 

7R  8M 

A/M 

Sci(32m.) 

M7R(22m) 

Tuesday 

LA 

7-8  R 

7M  8R 

7R  8M 

FL 

PE 

Wednesday 

LA 

SS 

7M  8R 

7R  8M 

Spec. 

Proj. 

PE 

Thursday 

LA 

LA(12m.) 

fVR(30m.) 

7M  8R 

7R  8M 

FL 

PE 

Friday 

LA 

SS 

7M  8R 

7R  8M 

A/M 

Sci 

R: 

Reading 

Homeroom: 

9:00  - 9:08 

Period  4: 

11:38  - 12:20 

M: 

Arithmetic 

Period  1: 

9:11  - 9:53 

Lunch-Recess: 

12:20  - 1:06 

LA: 

Language  Arts 

Period  2: 

9:56  - 10:38 

Period  5: 

1:09  - 2:03 

SS: 

Social  Studies 

Sci: 

Science 

Recess: 

10:38  - 10:53 

Period  6: 

2:06  - 3:00 

FL; 

Foreign  Language 

Period  3: 

10:53  - 11:35 

i/M: 

Art  or  Music 

PE;  Physical  Education 

L 


Figure  6.S  Implementation  of  the  Optimal  Curriculum 
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X = (6.65,  3.97,  5.13.  2.79.  1.92.  2.40). 

The  optimal  solution  found  by  the  proxy  algorithm  was; 

x‘  = (6.66.  3.99.  5.16.  2.84.  1.93.  2.42). 

The  actual  implementation  sums  to  22.86  instead  of  23  periods:  the  slight  differences  in 
the  second  decimal  result  from  including  the  eight  minute  homeroom  period. 

Mrs.  Grenke  was  very  pleased  with  the  analysis.  She  believed  the  solution  was  truly 
optimal  and  implemented  it  on  a regular  basis  in  January.  1977. 


Chapter  Vll 
SUMMARY 

7.1  Conclusions 

In  this  thesis,  I have  combined  two  rival  preference  modeling  techniques  in  a new 
approach  to  multi-attribute  decision  making.  This  new  approach  incorporates  the 
normatively  motivated  preference  models  of  the  global  procedure  as  proxy  functions  in  a 
local  procedure.  The  proxy  approach  uses  the  advantages  of  one  technique  to  overcome 
the  disadvantages  of  the  other;  the  resulting  combined  technique  yields  rapid 
convergence  without  restrictive  assumptions. 

I The  curriculum  planning  problem  shows  the  proxy  approach  is  practical  for 

decision  making  under  certainty.  The  decision  maker,  previously  unfamiliar  with 
decision  analysis,  was  able  to  provide  the  assessments  required  at  each  iteration.  With 
the  help  of  the  consistency  tests,  the  tradeoff  assessments  generated  trial  solutions  that 
converged  rapidly  to  the  optimum. 

7.2  Suggestions  for  Future  Research 

In  Chapter  V,  we  observed  theoretical  aspects  of  the  decision  problem  under 
uncertainty  that  present  major  obstacles  for  the  proxy  algorithm.  Although  the  iterative 
local  procedure  is  not  well  suited  for  decision  making  under  uncertainty,  other  local 
preference  modeling  procedures  may  be  useful.  Perhaps  the  analyst  could  construct  local 
preference  models  in  different  regions  of  the  outcome  space  and  piece  them  together  to 
form  a less  restrictive  global  preference  function.  Design  of  an  interpolation  scheme 
guaranteeing  the  conglomerate  global  function  satisfies  the  risk  preference  axioms  would 
be  an  important  practical  contribution. 

A second  area  deserving  more  attention  is  the  selection  of  the  outcome  attributes 
themselves.  The  proxy  iteration  algorithm  takes  the  outcome  variables  as  given,  and 
addresses  the  problem  of  finding  the  optimal  decision;  it  does  not  focus  on  construction 
of  the  decision  model  itself.  Selecting  the  decision  variables,  state  variables,  and 
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outcome  variables,  and  modeling  the  relationships  among  them  are  crucial  parts  of  a 
decision  analysis.  Various  hierarchical  and  economic  modeling  schemes  have  been  tried, 
but  no  unified  approach  to  modeling  multi-attribute  problems  currently  exists.  Building 
a general  framework  to  help  identify  the  key  elements  of  complex  decisions  remains  an 
important  topic  for  future  research. 
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Appendix  A 
NOTATION 


X 

d 

D 

X(D) 

s 

{sU} 

<Xl£> 

<xld,E> 

Xij(x) 

Xj(x) 

V(x) 

n(x) 

u(x) 

h[x  I x'‘] 
p[x  I x*‘,x*‘'^] 

q[d|d^s] 

> 

> 

> 

> 


one-dimensional  outcome  variable 

multi-attribute  outcome  vector;  the  underscore  indicates  column 
vector 

transpose  of  x 

vector  of  decision  variables 

set  of  feasible  decisions  d 

subset  of  Euclidean  space  restricted  to  D 

vector  of  state  variables 

joint  probability  distribution  over  s given  state  of  information  c 

expectation  of  x given  e 

conditional  expectation  of  x given  d and  e 

marginal  rate  of  substitution  of  Xj  for  Xj  at  x 

marginal  rate  of  substitution  of  Xj  for  Xj  at  x 

deterministic  preference  function 

numeraire  function 

risk  preference  function 

linear  approximation  of  deterministic  preference  function  fit  at  x 
sum-of-exponentials  approximation  of  deterministic  preference 
function,  fit  from  tradeoff  assessments  at  x'‘  and  x'‘‘* 
approximation  of  risk  preference  function  fit  at  x(^*'*i) 
strict  preference  relation 
indifference  relation 
weak  preference  relation 
is  greater  than 
is  equal  to 

Is  greater  than  or  equal  to 
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infinite  sequence  in  x 
limit  point  of  {x*'} 

gradient  of  the  function  f,  defined  as  a row  vector 
Hessian  matrix  of  f 

general  summation  operator 
indefinite  integral  of  f(x) 

summation  of  fj  over  i;  lower  limil  is  1 and  upper  limit  is  N 
unless  specified  otherwise 
is  identically  equal  to 


{x'} 

xt 

Vf 

V^f 

Sw 

S,', 


Appendix  B 

BEHAVIORAL  PROPERTIES 
OF  SEVERAL  PREFERENCE  FUNCTIONS 


The  Sum-of-Exponentials,  Sum-of- Powers,  and  Cobb-Douglass 
Deterministic  Preference  Functions 

With  Axiom  2.4,  we  assumed  decreasing  marginal  rates  of  substitution:  as  the 
individual  accumulates  increasing  amounts  of  attribute  Xj,  the  marginal  price  dXj  he  is 
willing  to  pay  for  each  additional  dx^  declines.  This  property  alone  does  not  limit  the 
preference  function  to  a specific  form.  A measure  of  the  rate  at  which  the  marginal  rate 
of  substitution  decreases  is  needed.  Keelin's  marginal  value  reduction  coefficient 
[20], 

Zjj(x)  = -[ax,j(x)/axj]  / Xj/x). 

measures  this  property;  it  is  defined  as  the  percentage  decrease  in  X,j(x)  per  unit 
increase  in  x^  with  all  other  attributes  held  constant 

Keelin  and  Barrager  each  assume  deterministic  independence.  For  N > 3,  this  very 
strong  assumption  states  that  preferences  among  any  subset  of  attributes  are  independent 
of  fixed  levels  of  the  other  attributes.  This  assumption  guarantees  the  preference 
function  has  an  additive  form,  V(x)  = VjCXj). 

Combining  various  assumptions  about  the  coefficient  z.jfx)  with  deterministic 
additivity,  Keelin  and  Barrager  derive  the  sum-of-exponentials,  sum -of- powers  and 
Cobb-Douglass  preference  functions.  The  assumptions  are  staled  below  for  each 
function;  the  derivations  are  found  in  their  dissertations  [3].[20]. 

Sum-of-Fxponentials.  If  Zj^(x)  is  a positive  constant  Wj  for  all  j^i,  and  if  V(x) 
= 2,  v.(x,),  then  V(x)  = -2^ 

For  each  additional  AXj,  the  percentage  decrease  in  X,j  is  constant  at  all  x, 
independent  of  the  level  of  any  attribute.  This  preference  function  obeys  the  following 
property: 


102 


I 


Appendix  B:  Behavioral  Properties 

xM  =»  %}  + a'  I x^  + a’ 

where  A'  = (A,  A,  ...  , A). 

Fts  indifference  curves  are  convex  if  and  only  if  w,  a > 0. 

Proof:  Along  any  indifference  curve,  3[dx,/dXj]  / 3xj  = [(Wj^aj)/(W|aj)]  e e*^'*'. 
If  w,  a > 0,  each  term  is  positive,  so  3[dX|/dXj]  / 3xj  >0  for  all  ij.  For  the  converse. 
Axiom  2.4  implies  3[dXj/dXj]  / 3xj  >0  for  all  i,j.  Nonsatiety  implies  3V(x)/3xj  > 
0,  so  Wjaj  > 0 for  all  j.  The'-efore  [(wj^aj)/(wia|)]  e e“'^'  >0  =*  w^,  a^  > 0 

w,  a > 0.  Q.E.D. 

Keel  in's  exponential  estimate  of  Zjj(x): 

For  Vj(Xj)  = -aje  the  marginal  value  reduction  coefficient  is  Zjj(x)  = Zj(xj)  = Mj. 

Let  X = (Xj,  Xj Xj x,,g)  and  x®  = (Xj,  x^ Xj.j,  Xj  + Axj,  Xj^j x^g). 

Taking  the  ratio  of  Xjj  at  these  points, 

Aij(x)  / Aj/x®)  = e"i^’‘J, 

so 

Zij(x)  = «J  = (l/AXj)ln  [Xjj(x)/A,j(x®)].  (B.I) 

This  equation  provides  an  exponential  estimate  for  Zjj(x). 

Sum-of-l*owers:  If  Zjj(x)  = (l+aj)/Xj  for  all  j?*!,  and  if  V(x)  = V|(Xj),  then 

V(x)  = -2j  ajX,  when  Oj  * 0,  and  V(x)  = 2j  aJnXj  when  = 0. 

With  these  functions,  an  additional  unit  of  any  attribute  implies  a smaller 
percentage  decrease  in  its  marginal  value.  As  the  individual  accumulates  more  of  each 
attribute,  he  becomes  less  sensitive  to  substitutions  among  them. 

For  the  special  case  when  Oj  = 0,  the  preference  function  V(x)  = 2,  a,lnXj  is  the 
additive  form  of  the  Cobb-Douglass  function  V(x)  = flj  x,  '.  Both  functions  define  the 
same  preference  ordering  since  they  are  equivalent  via  a logarithmic  transformation.  For 
the  Cobb-Douglass  function,  preferences  are  invariant  under  scaling:  if  x'  I x^  then 
bx’  I bx^  for  any  positive  constant  b.  This  invariance  also  holds  for  scaling  on  a single 
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attribute. 

Each  of  these  preference  functions  is  strictly  concave  since  its  Hessian  matrix  is  ' 

negative  definite.  The  indifference  surfaces  associated  with  each  preference  function  are 
strictly  convex  since  the  marginal  rates  of  substitution  are  strictly  decreasing. 

State-of-the-art  procedures  discussed  in  section  2.2  use  these  functions  as  global 
preference  models.  The  underlying  assumptions  are  very  restrictive  in  the  large,  but  they 
are  quite  reasonable  and  provide  good  proxy  functions  in  the  small. 

The  Keeney-Raiffa-Fishburn  Independence  Conditions  [11].[20],[21] 

Definition.  Risk  Additive  Independence:  Attributes  Xj,  X2,  ...  , are  risk  additive 
independent  if  preferences  over  lotteries  on  x depend  only  on  the  marginal  probability 

distributions  {Xjlc},  {x2l£} {x^le}  and  not  on  the  joint  probability  distribution 

X2 x^le}. 

Theorem.  A multi-attribute  utility  function  is  additive,  u(x)  = 2,  u,(X|),  if  and  only  if 
the  attributes  Xj,  X2 are  risk  additive  independent. 

Definition.  Utility  Independence:  Let  Y be  a subset  of  Xj,  X2 X|.,j.  Then  Y is 

utility  independent  of  its  complement  Y*^  if  for  any  pair  of  lotteries  {Y|cj}  and 

{Y|£2}. 

{Y|£|}  > {Y|£2}  for  some  fixed  Y'  =*■  {Y|tj}  > {Y|e2}  for  any  Y' 

where  > indicates  strict  preference  over  lotteries.  Attributes  Xj,  X2 x^  are 

mutually  utility  independent  if  every  subset  of  Xj,  X2 x^  is  utility  independent  of  its 

complement. 


Theorem.  A multi-attribute  utililty  function  has  either  an  additive  or  multiplicative 

form,  u(x)  = 2j  u,(Xj)  or  u(x)  = flj  u^fx,),  if  and  only  if  the  attributes  Xj,  X2 

x>g  are  mutually  utility  independent. 


APPENDIX  C 

GENERAL  OPTIMIZATION  THEOREMS 
USED  IN  THIS  THESIS 


Necessary  and  Sufficient  Conditions  for  Optimality 
Definition.  Let  x*  € E"  be  a point  satisfying  the  constraints 

h(x*)  = 0.  g(x*)  < 0. 

where  h(x)  represents  a set  of  m equalities  h[(x)  = 0.  h^fx)  = 0 

g(x)  represents  a set  of  p inequalities  gj(x)  < 0,  gifx)  < 0 gpfx)  <0-  Let  J 

be  the  set  of  indices  j for  which  g^fx*)  = 0.  Then  x*  is  said  to  be  a regular  point  of 
the  constraints  if  the  gradient  vectors  Vh,(x*),  Vg^fx*),  1 < i < m,  j E J are  linearly 
independent. 

Kuhn-Tucker  Conditions.  Let  x*  be  a relative  minimum  point  for  the  problem 

minimize  f(x) 

subject  to  h(x)  = 0,  g(x)  < 0 

and  suppose  x*  is  a regular  point  of  the  constraints.  Then  there  is  a vector  X € E^^ 
and  a vector  € Ep  with  n > 0 such  that 

Vf(x*)  + X'^h(5*)  ♦ = Q 

ug(x*)  = 0. 

Second-Order  Necessary  Conditions.  Suppose  that  the  functions  f.  g,  h € C^  and  that 
X*  is  a regular  point  of  the  constraints.  If  x*  's  a relative  minimum  point,  then  there 
is  a X € E^,  ft  € Ep,  h > 0 such  that  the  Kuhn-Tucker  conditions  hold  and  such  that 

L(x*)  = F(x*)  ♦ XH(x*)  + ttG(x*) 

is  positive  semidefinite  on  the  tangent  subspace  of  the  active  constraints  at  x*. 

Second-Order  Sufficiency  Conditions.  Let  f,  g.  h € C^.  Sufficient  conditions  that  a 
point  X*  be  a strict  relative  mimimum  point  are  that  x*  be  a regular  point  of  the 
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constraints,  that  the  Kuhn-Tucker  conditions  hold,  and  that  the  Hessian  of  the 
Lagrangian  L(x*)  is  positive  definite  on  the  subspace 

M’  = {y  I Vh(x*)i[  = 0.  Vgj(x*)y  = 0 for  all  j € J} 

where 

J = {j  I gj(i*)  = 0.  Uj  > 0}. 


Optimization  by  Dual  Methods 


Convex  Duality  Theorem;  Suppose  the  problem 

min  f(x) 


subject  to  h(x)  = 0,  g(x)  < 0 


has  a local  solution  at  x*  with  corresponding  value  r*  and  Lagrange  multipliers  X* 
and  ji*  > 0.  Suppose  also  that  x*  is  a regular  point  of  the  constraints  and  the  Hessian 
of  the  Lagrangian  L(x)  = F(x)  + XH(x)  + iiG(x)  is  positive  definite  everywhere.  Then 
x*  is  a global  minimum,  and  the  dual  problem 


maximize  <p(X,  ii),  n > Q 

where 

^(X.  = min^  [f(x)  + ^(x)  + lig(x)] 

has  a global  solution  at  X*.  /x*  with  corresponding  value  r*  and  x*  as  the  p)oint 
corresponding  to  X*.  fi*  in  the  definition  of  q>. 


Global  Convergence 

Global  Convergence  Theorem;  Let  A be  an  algorithm  on  X,  and  suppose  that  given  x® 
the  sequence  {x**}  is  generated  satisfying 

x''*'  € A(x'‘). 

Let  a solution  set  P C X be  given,  and  suppose 
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i.  all  points  x*'  are  contained  in  a compact  set  S C X. 

ii.  there  is  a continuous  function  Z on  X such  that 

a.  if  X C r,  then  Z(y)  < Z(x)  for  all  y € A(x). 

b.  if  X € r,  then  Z(y)  < Z(x)  for  all  y € A(x). 

iii.  the  mapping  A is  closed  at  points  outside  F. 

Then  the  limit  of  any  convergent  subsequence  of  {x'‘}  is  a solution. 


Spacer  Steps 

Spacer  Step  Theorem:  Suppose  B is  an  algorithm  on  X which  is  closed  outside  the 
solution  set  F.  Let  Z be  a descent  function  corresponding  to  B and  F.  Suppose  that 
the  sequence  {x'‘}  is  generated  satisfying 

x'‘^‘  € B(x'‘) 

for  k in  an  infinite  index  set  K,  and  that 

Z(x'‘*i)  < Z(x'‘) 

for  all  k.  Suppose  also  that  the  set  S = {x  | Z(x)  < Z(x®)}  is  compact.  Then  the  limit 
of  any  convergent  subsequence  of  {x'‘}*  is  a solution. 

Note:  All  theorems  in  Appendix  C are  taken  from  Luenberger  [26]. 
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GLOBAL  CONVERGENCE  OF  THE 
GOLDSTEIN  AND  ARMIJO  PROCEDURES 


Consider  the  optimization  problem 

minimize^^j^  f(x), 

X = {x  I X € R".  g(x)  < 0}  (D.I) 

where  fiR" -»  R and  giR" -►  R"’.  Assume  the  functions  f and  g are  differentiable, 
and  X C R"  is  nonempty. 

The  Frank-Wolfe  algorithm  generates  at  each  iteration  a direction  of  search  d' 
with  the  following  properties: 

a.  d'  € R"  is  bounded,  i.e.  ||d'||  < B for  some  B > 0 

b.  >0  3 Vi,  x‘  + ,xd‘  € X V/i  e [0,/i<^] 

c.  Vf(x‘)d‘  < 0 

Take  any  b € (0,1/2)  and  define  the  following  function  h:R->R 
h‘(A)  = [f(x‘+X/i‘d‘)  - f(x')]  / 

where  /t'  € [/i^/p,  /i^],  with  defined  above  and  p any  given  number  bigger 

than  one. 

Armijo  procedure: 

For  some  given  y > 1 pick  Aj  = max  {I,  \/y l/y".  ...}  such  that  h'(A')  > b 

and  set  x'**  = x'  + A'p'd'. 

Goldstein  procedure: 

If  h'(l)  > b,  pick  A'  = 1.  Otherwise  pick  any  A € A = {A  1 A € (0,1),  1-b  > 
h‘(A)  > b}. 

Set  x‘^1  = x‘  + AV‘d‘. 
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Theorem:  Convergence  of  the  Armijo  Procedure 

Let  f;R"  -►  R be  differentiable  and  continuous  on  an  open  set  F containing  X, 

the  constraint  set  of  problem  (D.l).  Then  for  any  accumulation  point  (x^.d^)  of  the 

sequence  {x'.d'},  Vf(x^)d^  = 0 for  the  Armijo  procedure  given  above. 

Theorem.  Convergence  of  the  Goldstein  Procedure 

Let  f:R'’  -♦  R be  differentiable  and  continuous  on  an  open  set  F containing  X, 

the  constraint  set  of  problem  (D.l).  Then  for  any  accumulation  point  (x^.d^)  of  the 

sequence  {x^d'},  Vf(x^)d^  = 0 for  the  Goldstein  procedure  above. 

The  definitions  and  theorems  in  this  appendix  are  taken  from  Garcia- Palomares  [13]. 
He  proves  global  convergence  by  the  Topkis-Vienott  approach  [32],  showing  every 
accumulation  point  is  a solution. 
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DYER’S  EXTENSION  OF  THE  FRANK-WOLFE  ALGORITHM 


a.  Let  z'‘  = + 3'‘  be  substituted  for  Vf(x'‘)  in  the  modified 

Frank-Wolfe  algorithm  (MFW). 


b. 

c. 

d. 


Let  r(5)  = {x  € X I f(x)  > f(y)  - 5 Vj;  € X} 

Let  c(y)  a max^^^  a'‘Vf(x'‘)y  + 

Dyer  [9]  shows  and  proves  the  following  theorem. 


Theorem.  If  f is  concave  and  differentiable,  X is  compact  and  convex, 

''"’k-^oo  = ^1°°’  '’"’k-^oo  . and  < <».  0 < lla°°ll  < oo. 

then  MFW  either  terminates  at  some  finite  iteration  k and  x'‘  ^ 

where  = max^^^  llx  - ^'‘11,  or  MFW  generates  an  infinite  sequence  {x*'},  and  every 
accumulation  point  of  {x'‘}  is  contained  in  r[(4°°/a°°)ll7i°°ll],  where 

= max^g^  llx  - y°®ll,  and  is  defined  in  (d). 
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the  optimum.  Currently  existing  local  procedures  use  successive  linear 
approximations;  these  linear  functions  are  poor  preference  models,  so 
the  iterative  procedure  is  slow  and  inefficient.  Since  each  iteration 
requires  a time-consuming  interaction  with  the  decision  maker,  the  slowly 
converging  procedure  is  not  practical. 

This  dissertation  combines  the  desirable  features  of  the  global  and  local 
techniques  in  a new  improved  method.  The  normatively  motivated  preference 
models  of  the  global  procedure  are  incorporated  as  proxy  functions  in  a 
local  procedure.  These  proxies  are  better  models  of  the  true  objective 
than  are  the  linear  approximations,  so  the  resulting  trial  sequence 
reaches  the  optimum  much  faster.  The  new  proxy  approach  yields  rapid 
convergence  without  restrictive  assumptions. 

After  the  theoretical  aspects  of  the  proxy  approach  are  developed,  the 
new  algorithm  is  applied  to  a curriculum  planning  problem.  This 
practical  application  was  successful;  the  decision  maker,  previously 
unfamiliar  with  decision  analysis,  was  able  to  provide  the  assessments 
required  at  each  iteration.  With  the  help  of  various  consistency  tests, 
the  tradeoff  assessments  generated  trial  solutions  that  converged  rapidly 
to  the  optimal  solution.  Numerous  insights  into  the  interactive  use  of 
the  algorithm  were  gained  from  this  practical  application. 
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